2. why?
"If you can not measure it, you can not improve it"
-Lord Kelvin
99.999% ("five nines") = 5.26 minutes
3. previously ...
Sending/Collecting is
complicated.
Single collection server.
Tedious to configure
new metric collection or
creation.
Calculating metric from
file is expensive.
4. bottlenecks ...
Poll based collection server
Not easy (!fun) to configure new metric collection or
creation.
=grunt work for ops-engineer
uhhhh....
12. Rocksteady, metric as event
1min.juicer.common.version.sc1.jcr1 100 1276822626
INSERT INTO Deploy
SELECT * FROM Metric(name='common.
revision') MATCH_RECORNIZE (
partition by colo, hostname
measures A.value as revision, A.colo as colo, A.
hostname as hostname, A.
app as app, A.timestamp as timestamp
pattern (A)
define
A as A.value > prev(A.value))
13. Rocksteady, metric as event
1min.juicer.common.version.sc1.jcr1 100 1276822626
INSERT INTO Deploy
SELECT * FROM Metric(name='common.
revision') MATCH_RECORNIZE (
partition by colo, hostname
measures A.value as revision, A.colo as colo, A.
hostname as hostname, A.
app as app, A.timestamp as timestamp
pattern (A)
define
A as A.value > prev(A.value))
15. correlation
Deployment related problem.
Capture sets of metrics when important ones crossed
threshold.
Determine dependencies such as cpu to request to second
or response time.
16. correlation
Deployment related problem.
Capture sets of metrics when important ones crossed
threshold.
Determine dependencies such as cpu to request to second
or response time.
18. beyond simple metric
Timing info per request.
Actual time spent in each component in an application.
Map out dependency, find exact area of problem.
19. beyond simple metric
Timing info per request.
Actual time spent in each component in an application.
Map out dependency, find exact area of problem.
20. what we learned?
1. Make metric sending simple.
2. Nice UI to make sense of data.
3. Real time processing of metric rocks.