Service Discovery like a Pro
Eran Harel
@eran_ha
● Motivation
● Consul Overview
● Consul Architecture
● API
● Alternatives
● Discovery and Client Side LB Demo
Do we have a problem at all?
my.target.service.url=http://host:port/contextBase/...
memcached.cluster=mem1:11211,mem1:11311,mem2:11211,mem2:111311
db.host=mysql:3304
kafka.brokers=kafka1:9092,kafka2:9092,kafka3:9092
Hard Coded Topology...
● What do we do if the nodes are dynamically allocated (e.g.
when using a scheduler a-la Mesos)?
● What do we do when the topology changes?
● What do we do if nodes become unhealthy?
● How can we scale (up/down) our clusters?
● What happens if the port is dynamic?
Why is Hard Coded Topology a problem?
The Traditional Approaches
● HTTP Load Balancers (e.g. HAProxy)
● TCP Keepalive
● VIP
● etc...
What is Consul?
Service discovery and configuration
made easy. Distributed, highly
available, and datacenter-aware.
● Developed @ Hashicorp
● Open source - https://github.com/hashicorp/consul
● Written in Go
● Current version 0.7.2
● https://consul.io/
● Service Discovery
● Scalable Failure Detection (Distributed Health Checks)
● K/V Store
● Load Balancing
● Multi Datacenter
● consul-template (external project)
Features
Basic Architecture
Consul APIs
DNS API
DNS API
$ host memcached.service.consul
memcached.service.consul has address 10.xx.xx.01
memcached.service.consul has address 10.xx.xx.02
memcached.service.consul has address 10.xx.xx.03
$
$ host test.memcached.service.consul
memcached.service.consul has address 10.xx.xx.51
memcached.service.consul has address 10.xx.xx.52
$
$ host prod.memcached-legacy.service.dc2.consul
memcached.service.consul has address 10.yy.xx.01
memcached.service.consul has address 10.yy.xx.02
REST API
● http://localhost:8500/v1/agent/service/register
● http://localhost:8500/v1/agent/service/deregister/<MyService>
● http://localhost:8500/v1/catalog/services/service/<MyService>
● http://localhost:8500/v1/catalog/nodes
● http://localhost:8500/v1/health/service/<MyService>
● Certain endpoints support a feature called a "blocking query." A blocking query is
used to wait for a potential change using long polling.
Consul CLI
$ consul
usage: consul [--version] [--help] <command> [<args>]
Available commands are:
agent Runs a Consul agent
configtest Validate config file
event Fire a new event
exec Executes a command on Consul nodes
force-leave Forces a member of the cluster to enter the "left" state
info Provides debugging information for operators
join Tell Consul agent to join cluster
keygen Generates a new encryption key
keyring Manages gossip layer encryption keys
leave Gracefully leaves the Consul cluster and shuts down
lock Execute a command holding a lock
maint Controls node or service maintenance mode
members Lists the members of a Consul cluster
monitor Stream logs from a Consul agent
reload Triggers the agent to reload configuration files
rtt Estimates network round trip time between nodes
version Prints the Consul version
watch Watch for changes in Consul
Consul CLI (cont)
$ consul maint -service Hello0 -enable
Service maintenance is now enabled for "Hello0"
On the server log:
2015/12/09 21:51:13 [INFO] agent: Service "Hello0" entered maintenance mode
2015/12/09 21:51:13 [INFO] agent: Synced check '_service_maintenance:Hello0'
What are the alternatives?
● ZooKeeper, doozerd, etcd
● Chef, Puppet, etc
● Nagios, Sensu
● SkyDNS
● SmartStack
● Serf
Implementing Service Discovery
How do we implement discovery and client side LB?
Each module registers itself to the local consul agent upon startup, and provides
enough metadata to allow filtering
http://localhost:8500/v1/register
{
"ID": "Hello0",
"Name": "Hello",
"Port": 8080,
"Tags": [
"instance0",
"production",
"httpPort-8080",
"contextPath-/api",
],
"Check": {
"HTTP": "http://localhost:8080/api/hello/instance",
"Interval": "1s",
"Timeout”: "1s"
}
}
How do we implement discovery and client side LB?
The local consul agent calls the provided health check(s) and verifies the instances are
healthy.
Don’t forget to add proper timeouts!
curl --fail --max-time 1 “http://localhost:8080/api/hello/instance”
How do we implement discovery and client side LB?
Clients perform long polling queries to the health API, maintain a list of healthy
instances, and build target URLs.
At Outbrain we use the ConsulBasedTargetProvider with HealthTargetsList
to achieve this.
http://localhost:8500/v1/health/service/Hello?passing=true&tag=production&stale=true&
index={index}&wait=30s
X-Consul-Index=4245721
How do we implement discovery and client side LB?
Upon client request, we select a target based on some strategy (e.g. round-robin).
How do we implement discovery and client side LB?
Clients need to implement resilience logic such as retries, timeouts, circuit-breakers,
etc
final HelloService helloService = new ClientBuilder<>(HelloService.class).
setProtocol(ContentType.JSON).
setConnectionTimeout(100).
setRequestTimeout(100).
setRetries(3).
setTargetProvider(new ConsulBasedTargetProvider(healthyTargetsList, "/hello", null)).
build();
Discovery and Client Side LB Demo
Hello2
Hello1
http://localhost:8500/v1/register
{
"ID": "Hello0",
"Name": "Hello",
"Port": 8080,
"Tags": [
"instance0",
"production",
"httpPort-8080",
"contextPath-/api",
],
"Check": {
"HTTP": "http://localhost:8080/api/hello/instance",
"Interval": "1s",
"Timeout”: "1s"
}
}
http://localhost:8500/v1/health/service/Hello
?passing=true&tag=production&stale=true
&index={index}&wait=30s
Demo Preview
Hello0 :8080
:8081
:8082
Register
Call service
References & Links
● Consul Docs - https://consul.io/docs/index.html
● Example Source Code -
https://github.com/outbrain/ob1k/tree/master/ob1k-example/src/main/java/com/ou
tbrain/ob1k/example/hello
We are recruiting...
http://www.outbrain.com/about/careers

Service discovery like a pro (presented at reversimX)

  • 1.
    Service Discovery likea Pro Eran Harel @eran_ha
  • 2.
    ● Motivation ● ConsulOverview ● Consul Architecture ● API ● Alternatives ● Discovery and Client Side LB Demo
  • 3.
    Do we havea problem at all?
  • 4.
  • 5.
    ● What dowe do if the nodes are dynamically allocated (e.g. when using a scheduler a-la Mesos)? ● What do we do when the topology changes? ● What do we do if nodes become unhealthy? ● How can we scale (up/down) our clusters? ● What happens if the port is dynamic? Why is Hard Coded Topology a problem?
  • 6.
  • 7.
    ● HTTP LoadBalancers (e.g. HAProxy) ● TCP Keepalive ● VIP ● etc...
  • 8.
  • 9.
    Service discovery andconfiguration made easy. Distributed, highly available, and datacenter-aware.
  • 10.
    ● Developed @Hashicorp ● Open source - https://github.com/hashicorp/consul ● Written in Go ● Current version 0.7.2 ● https://consul.io/
  • 11.
    ● Service Discovery ●Scalable Failure Detection (Distributed Health Checks) ● K/V Store ● Load Balancing ● Multi Datacenter ● consul-template (external project) Features
  • 12.
  • 13.
  • 14.
  • 15.
    DNS API $ hostmemcached.service.consul memcached.service.consul has address 10.xx.xx.01 memcached.service.consul has address 10.xx.xx.02 memcached.service.consul has address 10.xx.xx.03 $ $ host test.memcached.service.consul memcached.service.consul has address 10.xx.xx.51 memcached.service.consul has address 10.xx.xx.52 $ $ host prod.memcached-legacy.service.dc2.consul memcached.service.consul has address 10.yy.xx.01 memcached.service.consul has address 10.yy.xx.02
  • 16.
    REST API ● http://localhost:8500/v1/agent/service/register ●http://localhost:8500/v1/agent/service/deregister/<MyService> ● http://localhost:8500/v1/catalog/services/service/<MyService> ● http://localhost:8500/v1/catalog/nodes ● http://localhost:8500/v1/health/service/<MyService> ● Certain endpoints support a feature called a "blocking query." A blocking query is used to wait for a potential change using long polling.
  • 17.
    Consul CLI $ consul usage:consul [--version] [--help] <command> [<args>] Available commands are: agent Runs a Consul agent configtest Validate config file event Fire a new event exec Executes a command on Consul nodes force-leave Forces a member of the cluster to enter the "left" state info Provides debugging information for operators join Tell Consul agent to join cluster keygen Generates a new encryption key keyring Manages gossip layer encryption keys leave Gracefully leaves the Consul cluster and shuts down lock Execute a command holding a lock maint Controls node or service maintenance mode members Lists the members of a Consul cluster monitor Stream logs from a Consul agent reload Triggers the agent to reload configuration files rtt Estimates network round trip time between nodes version Prints the Consul version watch Watch for changes in Consul
  • 18.
    Consul CLI (cont) $consul maint -service Hello0 -enable Service maintenance is now enabled for "Hello0" On the server log: 2015/12/09 21:51:13 [INFO] agent: Service "Hello0" entered maintenance mode 2015/12/09 21:51:13 [INFO] agent: Synced check '_service_maintenance:Hello0'
  • 19.
    What are thealternatives? ● ZooKeeper, doozerd, etcd ● Chef, Puppet, etc ● Nagios, Sensu ● SkyDNS ● SmartStack ● Serf
  • 20.
  • 21.
    How do weimplement discovery and client side LB? Each module registers itself to the local consul agent upon startup, and provides enough metadata to allow filtering http://localhost:8500/v1/register { "ID": "Hello0", "Name": "Hello", "Port": 8080, "Tags": [ "instance0", "production", "httpPort-8080", "contextPath-/api", ], "Check": { "HTTP": "http://localhost:8080/api/hello/instance", "Interval": "1s", "Timeout”: "1s" } }
  • 22.
    How do weimplement discovery and client side LB? The local consul agent calls the provided health check(s) and verifies the instances are healthy. Don’t forget to add proper timeouts! curl --fail --max-time 1 “http://localhost:8080/api/hello/instance”
  • 23.
    How do weimplement discovery and client side LB? Clients perform long polling queries to the health API, maintain a list of healthy instances, and build target URLs. At Outbrain we use the ConsulBasedTargetProvider with HealthTargetsList to achieve this. http://localhost:8500/v1/health/service/Hello?passing=true&tag=production&stale=true& index={index}&wait=30s X-Consul-Index=4245721
  • 24.
    How do weimplement discovery and client side LB? Upon client request, we select a target based on some strategy (e.g. round-robin).
  • 25.
    How do weimplement discovery and client side LB? Clients need to implement resilience logic such as retries, timeouts, circuit-breakers, etc final HelloService helloService = new ClientBuilder<>(HelloService.class). setProtocol(ContentType.JSON). setConnectionTimeout(100). setRequestTimeout(100). setRetries(3). setTargetProvider(new ConsulBasedTargetProvider(healthyTargetsList, "/hello", null)). build();
  • 26.
  • 27.
    Hello2 Hello1 http://localhost:8500/v1/register { "ID": "Hello0", "Name": "Hello", "Port":8080, "Tags": [ "instance0", "production", "httpPort-8080", "contextPath-/api", ], "Check": { "HTTP": "http://localhost:8080/api/hello/instance", "Interval": "1s", "Timeout”: "1s" } } http://localhost:8500/v1/health/service/Hello ?passing=true&tag=production&stale=true &index={index}&wait=30s Demo Preview Hello0 :8080 :8081 :8082 Register Call service
  • 28.
    References & Links ●Consul Docs - https://consul.io/docs/index.html ● Example Source Code - https://github.com/outbrain/ob1k/tree/master/ob1k-example/src/main/java/com/ou tbrain/ob1k/example/hello
  • 29.