Effective
Monitoring
with
@alq
CTO at
Datadog
An application
through the naked eye
An application
through a monitoring
tool
OODA Loop (simplified)
Observe

Orient

Act

Decide
OODA Loop (simplified)
Observe

Orient

Act

Decide
M
on
ito
g rin
To
ol

OODA Loop (simplified)
Observe

Orient

Act

Decide
Yo
u

M
on
ito
g rin
To
ol

OODA Loop (simplified)
Observe

Orient

Act

Decide
Yo
u

M
on
ito
g rin
To
ol

OODA Loop (simplified)
Orient

Yo
u

Observe

Act

Decide
M
on
ito
g rin
To
ol

Yo
u

Yo
u

Yo
u

OODA Loop (simplified)
Observe

Act

Orient

Decide
Observations need to
be...
1.Timely
2.Correct
3.Comprehensive
Observations need to
be...
1.Timely
2.Correct
3.Comprehensive
Observations need to
be...
1.Timely
2.Correct
3.Comprehensive

Else
Observations need to
be...
1.Timely
2.Correct
3.Comprehensive

Else

Garbage
In, Garbage
Out
Timely
Initial
assumptions

Initial set of
metrics

Contact
with reality

Revised
assumptions

Revised set
of metrics
M
N in
ot ut
we es
ek
s

Timely
Initial
assumptions

Initial set of
metrics

Contact
with reality

Revised
assumptions

Re...
Comprehensive
Resources
Resources
Resources
Resources
Resources

Work

Value
Comprehensive
Easy to collect
generic
but not actionable
Resources
Resources
Resources
Resources
Resources

Work

Value
Comprehensive
Easy to collect
generic
but not actionable
Resources
Resources
Resources
Resources
Resources

Work

Value

H...
statsD
Easy
statsD
Easy
Timely
statsD
Easy
Timely

Comprehensive
How statsD works
pageviews:100|
c@0.25
latency:320|ms
backlog:333|g
uniques:765|s
Client libraries talk to a
simple UDP se...
statsD types
Type

Definition

Example

Gauges

Absolute values

Queue size

Counters

Per-second rates

Page views

Histog...
statsD problems
Type
Gauges
Counters
Histograms
Timers
Sets

Definition

Problem

Absolute values

Latest value wins.
Gauge...
#1 pitfall: “Counters”
http://dtdg.co/tokyo-counters
How we use statsD
http://dtdg.co/tokyo-dog
Essential: Tagging
http://dtdg.co/tokyo-tags
How to get started
• statsD https://github.com/etsy/statsd
• client libraries https://github.com/etsy/statsd/wiki
(my comp...
ありがとうございました。
質問?@alq

Thank you very much!
Questions? @alq
Upcoming SlideShare
Loading in …5
×

Effective monitoring with statsd - Alexis lê-quôc

893 views

Published on

Published in: Technology, Design
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
893
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Effective monitoring with statsd - Alexis lê-quôc

  1. 1. Effective Monitoring with
  2. 2. @alq CTO at Datadog
  3. 3. An application through the naked eye
  4. 4. An application through a monitoring tool
  5. 5. OODA Loop (simplified) Observe Orient Act Decide
  6. 6. OODA Loop (simplified) Observe Orient Act Decide
  7. 7. M on ito g rin To ol OODA Loop (simplified) Observe Orient Act Decide
  8. 8. Yo u M on ito g rin To ol OODA Loop (simplified) Observe Orient Act Decide
  9. 9. Yo u M on ito g rin To ol OODA Loop (simplified) Orient Yo u Observe Act Decide
  10. 10. M on ito g rin To ol Yo u Yo u Yo u OODA Loop (simplified) Observe Act Orient Decide
  11. 11. Observations need to be... 1.Timely 2.Correct 3.Comprehensive
  12. 12. Observations need to be... 1.Timely 2.Correct 3.Comprehensive
  13. 13. Observations need to be... 1.Timely 2.Correct 3.Comprehensive Else
  14. 14. Observations need to be... 1.Timely 2.Correct 3.Comprehensive Else Garbage In, Garbage Out
  15. 15. Timely Initial assumptions Initial set of metrics Contact with reality Revised assumptions Revised set of metrics
  16. 16. M N in ot ut we es ek s Timely Initial assumptions Initial set of metrics Contact with reality Revised assumptions Revised set of metrics
  17. 17. Comprehensive Resources Resources Resources Resources Resources Work Value
  18. 18. Comprehensive Easy to collect generic but not actionable Resources Resources Resources Resources Resources Work Value
  19. 19. Comprehensive Easy to collect generic but not actionable Resources Resources Resources Resources Resources Work Value Harder to collect, custom but most actionable
  20. 20. statsD Easy
  21. 21. statsD Easy Timely
  22. 22. statsD Easy Timely Comprehensive
  23. 23. How statsD works pageviews:100| c@0.25 latency:320|ms backlog:333|g uniques:765|s Client libraries talk to a simple UDP server... ...using a simple text protocol
  24. 24. statsD types Type Definition Example Gauges Absolute values Queue size Counters Per-second rates Page views Histograms Gauge summary Page Latency Timers Gauge distribution Page Latency Sets Counters of unique things Unique visitors
  25. 25. statsD problems Type Gauges Counters Histograms Timers Sets Definition Problem Absolute values Latest value wins. Gauge deltas??? Per-second rates Rates, not counts (! = rrdtool) Gauge summary Assumes normal distribution Can measure much Gauge distribution more than time Counters of unique things :-)
  26. 26. #1 pitfall: “Counters” http://dtdg.co/tokyo-counters
  27. 27. How we use statsD http://dtdg.co/tokyo-dog
  28. 28. Essential: Tagging http://dtdg.co/tokyo-tags
  29. 29. How to get started • statsD https://github.com/etsy/statsd • client libraries https://github.com/etsy/statsd/wiki (my company) 1-stop shop http://www.datadoghq.com
  30. 30. ありがとうございました。 質問?@alq Thank you very much! Questions? @alq

×