4. Cloud infrastructure team
Maxime Poirier-Journeault Jérome Heil
Cloud system specialist Cloud system developer
Alexandre Rousseau
Cloud system developer
Dave Robitaille
Cloud system specialist
Martin Ouellet
Cloud infra team lead
5. Where we’re at
● Kubernetes is the main compute
engine
● 95% of the workload runs on Linux
● Everything is deployed autonomously
5
12 K8S clusters
8 Production clusters
12500 Pods
370 Nodes
7 Regions worldwide
8. Prometheus
• Defied the gods by stealing fire from them
and gave it to humanity in the form of
technology and knowledge
• Generally seen as the author of the human
arts and sciences
9. Prometheus Alone, only goes so far
- Good for the first few millions time series
- Not so redundant
- Costly to have a long retention time
9
10. Thanos
(Thanatos)
• God of non-violent deaths
• Merciless and indiscriminate, hated by
mortals and gods alike
• Also a reference to the “snap of
disintegration” in Avengers
11. High Availability
- Double prometheus instances
- Exact same configuration
- anti-affinity and pod-topology-constraints
- pod-disruption-budget
16. Your software exposes a metric
There are a lot of pre-made exporters,
but we also make our own
• Default exporters from kubernetes
• Node exporter in the base AMI
• Custom exporters for Coveo software
17. The metric is scraped
And enters the time series database
• Uses ServiceMonitors
• Most metrics are served on endpoint /metrics
• Accessible through the Sidecar
18. Groups of 3 hours are
uploaded to S3
Prometheus makes a chunk out of 3
hours of data. One chunk is made for
each time series.
Thanos sidecar will upload chunks to
aws S3