Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

2019 05-28 SRE Consul Criteo Meetup

189 views

Published on

This presentation explains the challenges we face at Criteo on discovering machines and services.

Criteo is using HashiCorp's Consul to discover services. We explain what is Consul, how it works, what are the challenges we faced and how we improved it.

We also explain how using it in combinaison with consul-templaterb can allow us performing Inversion Of Control for the whole infrastructure allowing us to use it as a database and iterating faster.

Published in: Internet
  • Login to see the comments

  • Be the first to like this

2019 05-28 SRE Consul Criteo Meetup

  1. 1. Pierre Souchay pierresouchay Meetup S.R.E Tech Talk – Criteo Paris 05/28/2019 Consul Discovery: Challenges & Opportunities
  2. 2. 2 • EJB / 3 tier architecture 1/2 Web 1 App 1 Database Early SOA (Service Oriented Architecture) Discovery’s history from 2001 to 2010
  3. 3. 3 • SOA Step 2 1 Service / Machine 1 Apps needs several services Discovery’s history early 2010’s
  4. 4. 4 • SOA Step 3: µ-services 1 container = 1 service(s) 10’s of services / machine 1000’s of services Discovery now 3,000+ services 200,000+ instances 2,000+ instances/service 35,000+ machines 9 DCs
  5. 5. We need tools
  6. 6. 6 • Open-Source, 2014 No SPOF / Fault Tolerant Distributed Agents on all machines Services Oriented / DC aware Updates in Real-Time of Services Distributed toolbox (Locks, K/V…) Easy to integrated (DNS support) Can Work on any IP network Consul is a Discovery Database with fault-tolerance
  7. 7. The team
  8. 8. 8 • 5 people Create SDK for other teams (JVM, C#, python, ruby) Handle all infrastructure, on-call 24/24 7/7 Architecture patterns 1st worldwide contributors to Consul The team
  9. 9. The challenges
  10. 10. 10 • “Criteo has probably the most intensive usage of Consul in the world” (M. Hashimoto) Discover all instances of all systems in our applications All the Load-Balancing, the DNS provisioning Metrics Alerting systems Used it with bare-metal (Windows, Linux), Mesos, Kubernetes, Hadoop One of the biggest installations of Consul in the world
  11. 11. 11 • RPC query/s/DC from ~1.5k/s to 9k/s (up to 1300 qps on a single service) 1 change/sec on a large service of many instances → 2k req/sec if 2k observers #RealLife Consul At Scale
  12. 12. 12 • ~100 PRs, 70+ merged upstream ~15 PR for features (Service.Meta, weights…) ~30 PR merged for performance (DNS, watches) ~10 PR merged for safety (node registration, memberlist…) ~2 PR fixing security bugs OSS UI: https://github.com/criteo/consul-templaterb/ Our Pull request
  13. 13. 13 • Bandwidth: from 1Gb/s to 12k/s CPU: from 32/32 CPUs at 100% to 1/32 CPU at 100% From 3/4 notifications/s to 1 notification/10 min for 1 service From 1 incident / 10 days to no incident in 6 months From a fragile tool to a database for the whole infrastructure Prometheus improvements Metadata for services… Improvements from our merged PRs
  14. 14. 14 • A OSS Scalable UI for Consul (consul-templaterb)
  15. 15. Opportunities: Building new stuff
  16. 16. 16 • Load-Balancing unification F5 / HaProxy / DNS / Service-Mesh Weight issue Snowball effect Load-Balancing in Consul
  17. 17. 17 • Services expose semantics: I want HTTPS I speak Swagger Call someone when I lost 40% of capacity Tools observe, react and provision systems: Consul is an infrastructure database Inversion of Control
  18. 18. 18 • Inversion Of Control
  19. 19. 19 • Inversion of Control
  20. 20. Building the future
  21. 21. 21 • Infrastructure Service Mesh
  22. 22. 22 • Infrastructure Service Mesh (Details)
  23. 23. 23 • GitHub: - Criteo’s patches: http://github.com/criteo-forks/consul/ - UI and template system: http://github.com/criteo/consul-templaterb/ Video resources of previous presentations - HashiConf ‘18: https://www.hashicorp.com/resources/criteo-containers-consul-connect - HashiTalks (Operating Consul at Scale): https://www.youtube.com/watch?v=x9MiaV0WdSs - Devoxx ‘19: https://www.youtube.com/watch?v=aQb2_WrmED0 - Consul-Timeline (our UI): https://youtu.be/zLzrLGLLl4Q Q&A + Resources

×