1. Debugging / Gain insight
"Where does the problem come from / What is going on?"
● Segment by sources (Google Ads, Fb Ads, Bing Ads, Taboola, etc.)
○ Did they slow down? Error rate gone up? Are they unavailable?
● Segment by category
○ Did we introduce a bug on that code?
● Segment by node
○ do I have a problem on that node?
3. Trends / Scale
● Trends over time, drive scale (technical) / business decisions
○ Capacity planning
○ "Will I (when will I) have a problem in the future?"
● SLA / QoS
And all this is available thanks to this code:
Push (vs pull)
- Async, short-lived processes
- The prometheus way => send metrics to a push gateway
- One push gateway per process !
- More infrastructure to setup
- Our way, the prometheus-distributed-client => send metrics to a database
- Available from everywhere
- Consistent in case of concurrent calls
- Use either
- Try to always instrument your code
- Limite the cardinality of the metrics you use
- Make nice graphs !
- Use Our lib : https://github.com/dolead/prometheus-distributed-client