HashiConf '19
Explaining how we use Inversion of Control at Criteo to create very effective types of services
https://hashiconf.hashicorp.com/schedule/inversion-of-control-with-consul
1. Pierre Souchay
Discovery Team @Criteo
Twitter: @vizionr
Github: pierresouchay
Inversion Of
Control with
Consul
Leading Discovery Team @Criteo (SDKs + Consul)
Dealing with 240k+ services, 38k Consul nodes in 9 DCs
1st external contributor to Consul
Author of consul-templaterb
2. Today’s dishes
• Starters
• History of Consul at Criteo
• Entrées
• Inversion of Control explained
• Cheese
• Real World Examples
• Sweets
• How it changes infrastructure
10. 2015 - Mesos
• Containers
• Frequent changes
• Many services/machine
• Different Provisioning
11.
12.
13. 1 2 3 4
Provisioning
time
is an issue
globally (F5)
Database
polling shows
its limits
Services both
in containers
and machines?
More latency
Introduced
By new
Load-Balancers
Sounds almost good enough but…
15. Consul
to discover everything
• No SPOF`
• Multi DC support
• Service oriented
• Real time updates
• Toolbox (KV, locks)
• DNS integration!
• Working on IP
19. Step 3 was harder
• Watch changes many services
→ cpu/net: idx/service: #3899 (and many more)
• Leader get saturated
→ discovery_max_stale: #3920
• DNS issues on big services
→ DNS fixes: #3940, #3948, #4071,
• 800mb/s to watch changes
→ consul-templaterb : now 12kb/s
• Weights in services / meta in services
→ #3881 / #4047 / #4468
21. What did we learn about our users?
love their services configuration into their systems
22. What did we learn about our users?
love their services
want predictability
configuration into their systems
give them tools to investigate
23. What did we learn about our users?
love their services
want predictability
love business semantics
configuration into their systems
give them tools to investigate
focus on semantics, ignore tools
24. What did we learn about our users?
love their services
want predictability
love business semantics
want it fast and magic
configuration into their systems
give them tools to investigate
focus on semantics, ignore tools
magic is better than As A Service
25. Can we go further ?
Can we change the way we create infrastructure and tools?
27. 27 •
Inversion of Control
Decoupling systems stuff using a framework
Provides semantics of your needs
Someone will provide what you need (and
much more)
Broader than Dependency Injection
29. 29 •
Consul exposes lots of stuff
list all services filter services using tags Notifications in real time Provide configuration
settings (KV/Service Data)
30. 30 •
Let’s use
those features
Expose searchable semantics
using tags
Provide configuration hints with
business semantics as meta
Tools observe, react & provision
Consul is like an infra DB
34. Why is meta so cool?
Direct configuration
• alert_* => automatic alerts
• vip_*=> VAAS Configuration
• swagger_* => you saw it
…and information
• version
• start
• team
• OWNERS...
.. Automatically cleaned up
35. Consumers of those meta can be…
OPT-IN
• VAAS (network load balancing)
• Swagger repository
OPT-OUT
• Chaos Monkey
• Security Scanner
• Automatic Alerting
• App Watcher
• Version Scanner
36. More meta,
More power,
More services
node meta + service meta: 2 layers
metrics can re-use it as well (ex:
Prometheus/Consul integration)
Templates are re-useable
Same meta can be re-used for new tools
It gets easier and easier
37. Isn't K/V the right place instead of meta?
Most of the time… no
Cardinality is hard to get right: a service is NOT a monolith
Cleanup is just too hard
It gives a bit more work on consumer side, because of cardinality
40. Automatic Metering /
Alerts
• templates of consul-templaterb generating prometheus alerts
• Provide 100% of coverage of Criteo for free
• Also provide metrics such as availability for all of Criteo
• App availability according to version/OS/rack, using meta!
• Re-use those meta in all metrics
41. 41 •
VAAS
• Provides all networking for Criteo
• Serving more than 4M HTTP req/s
• Share semantics for several load-balancers
• HaProxy
• F5
• Provisioning of much more than reverse proxies
• DNS (including Geo-DNS)
• TLS
• Real time creation of Services (less than 1 minute)
42. 42 •
• Detect old applications
• Detect invalid ownership
• Old security groups
• Deprecated users…
Services Scanner
43. 43 •
Consul-UI / Consul-
Timeline
• Live logs for all services
• History of services
• http://github.com/criteo/consul-
templaterb/
• Provides real time updates about the
status of all services
• Provides an history of changes for all
services
44. 44 •
And much
more…
Swagger browser (catalog of all JSON APIs
in Criteo)
Chaos Monkey
Resource Tracking Systems
Latency Monitoring between machines
Security Scanner looks up for new services
to scan
46. Removes
configuration
from hidden
places
If you are providing a cross service new
system, you probably don’t need a git
repository for the configuration
So everything is transparent and open to
everybody
Information is where it needs to be, on
the service itself
Ease onboarding of newcomers
47. Cleanup is not
a hard
problem
anymore
Systems live and die,
consumers react
Ops synchronization is not
needed anymore
48. Help
innovating
Real decoupling
You can start your new project on your laptop
Templating systems create your configs easily
No migration costs anymore, we don’t configure
tools
Semantics are better than YAML config files