The longest 5 minutes
in our life.
@tagomoris
2013/11/30 Monitoring Casual Talks in Kyoto

13年11月30日土曜日
※タイトルは中二
13年11月30日土曜日
TAGOMORI Satoshi (@tagomoris)
LINE Corp.
Hadoop, Fluentd, Norikra, ...

13年11月30日土曜日
13年11月30日土曜日
ISUCON勝ちました

13年11月30日土曜日
石狩DC見学ツアーエヴァンジェリスト

13年11月30日土曜日
What 5min. is for?
ISUCON
Our new service launches
Our services in troubles

13年11月30日土曜日
What we can do in
5min.?
Investigate logs! Logs! Logs!
Hot request paths
Heavy request paths
How many requests? How many u...
Logs
Retrospection: past N min. logs
Inspection: logs now tailing
Prospection: incoming N min. logs

13年11月30日土曜日
Retrospection
in ISUCON
We MUST NOT be a slave of information.
Too many is worse.
We MUST know factors at least.
Too few i...
analyze_apache_logs
Bundled with Apache::Log::Parser (in CPAN)
Read logs from STDIN, and analyze it
For each method/paths
...
$ cat /var/log/httpd/access_log | analyze_apache_logs -s path
TOTAL: 1801
*! duration avg:97.33, min:76, max:110! tatus 20...
Retrospection
in action
Shib: Hive WebUI -> mapreduce
ex: N min. logs of 10 mins ago
Import lag / MapReduce lag
Kibana: El...
Retrospection:
Fluentd+GrowthForecast
HTTP Response Status

HTTP Response Times (Avg, [50,90,95,98,99]%tiles)

13年11月30日土曜...
Inspection
ImHacker by @cho45
http://subtech.g.hatena.ne.jp/cho45/20120810/1344606438

13年11月30日土曜日
Prospection
Queries for future/incoming logs
both of access logs and application logs
results for 5min. logs at just 5min....
Norikra:
Schema-less Stream
Processing with SQL
13年11月30日土曜日
Norikra(1):
Schema-less event stream:
Add/Remove data fields whenever you want

SQL:
No more restarts to add/remove queries...
Norikra(2):
Open source software:
Licensed under GPLv2
Based on Esper
UDF plugins from rubygems.org

Ultra-fast bootstrap ...
Norikra Queries: (1)

SELECT name, age
FROM events

13年11月30日土曜日
Norikra Queries: (1)
{“name”:”tagomoris”,
“age”:34, “address”:”Tokyo”,
“corp”:”LINE”, “current”:”Kyoto”}

SELECT name, age...
Norikra Queries: (1)
{“name”:”tagomoris”,
“address”:”Tokyo”,
“corp”:”LINE”, “current”:”Kyoto”}

SELECT name, age
FROM even...
Norikra Queries: (2)
{“name”:”tagomoris”,
“age”:34, “address”:”Tokyo”,
“corp”:”LINE”, “current”:”Kyoto”}

SELECT name, age...
Norikra Queries: (2)
{“name”:”secondlife”,
“age”:99, “address”:”Tokyo”,
“corp”:”Cookpad”, “current”:”Nara”}

SELECT name, ...
Norikra Queries: (3)

SELECT age, COUNT(*) as cnt
FROM events.win:time_batch(5 mins)
GROUP BY age

13年11月30日土曜日
Norikra Queries: (3)
{“name”:”tagomoris”,
“age”:34, “address”:”Tokyo”,
“corp”:”LINE”, “current”:”Kyoto”}

SELECT age, COUN...
Norikra Queries: (4)
{“name”:”tagomoris”,
“age”:34, “address”:”Tokyo”,
“corp”:”LINE”, “current”:”Kyoto”}

SELECT age, COUN...
Norikra Queries: (5)
{“name”:”tagomoris”,
“user:{“age”:34, “corp”:”LINE”,
“address”:”Tokyo”},
“current”:”Kyoto”,
“speaker”...
Norikra Queries: (5)
{“name”:”tagomoris”,
“user:{“age”:34, “corp”:”LINE”,
“address”:”Tokyo”},
“current”:”Kyoto”,
“speaker”...
Norikra Queries: (5)
{“name”:”tagomoris”,
“user:{“age”:34, “corp”:”LINE”,
“address”:”Tokyo”},
“current”:”Kyoto”,
“speaker”...
Before: Hive

EVERY HOUR!

SELECT
yyyymmdd, hh, campaign_id, region, lang,
count(*) AS click,
count(distinct member_id) AS...
After: Norikra
SELECT
campaign.id AS campaign_id, member.region AS region,
count(*) AS click,
count(distinct member.id) AS...
Before: Fluentd

EACH SERVICES

<match for.target.service>
type numeric_monitor
unit minute
tag service.response
output_ke...
After: Norikra

EACH SERVICES!

SELECT
percentiles(api_response_time, [50,90,95,98,99]) AS p
FROM target_service.win:time_...
Conclusion
Retrospections are important
We have many methods for retrospections now
Prospections are also important
For co...
Upcoming SlideShare
Loading in...5
×

The longest 5 minutes in our life

2,193

Published on

Monitoring as retrospection/inspection/prospection

Published in: Technology, Travel
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,193
On Slideshare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
10
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

The longest 5 minutes in our life

  1. 1. The longest 5 minutes in our life. @tagomoris 2013/11/30 Monitoring Casual Talks in Kyoto 13年11月30日土曜日
  2. 2. ※タイトルは中二 13年11月30日土曜日
  3. 3. TAGOMORI Satoshi (@tagomoris) LINE Corp. Hadoop, Fluentd, Norikra, ... 13年11月30日土曜日
  4. 4. 13年11月30日土曜日
  5. 5. ISUCON勝ちました 13年11月30日土曜日
  6. 6. 石狩DC見学ツアーエヴァンジェリスト 13年11月30日土曜日
  7. 7. What 5min. is for? ISUCON Our new service launches Our services in troubles 13年11月30日土曜日
  8. 8. What we can do in 5min.? Investigate logs! Logs! Logs! Hot request paths Heavy request paths How many requests? How many users? and, and, and ... 13年11月30日土曜日
  9. 9. Logs Retrospection: past N min. logs Inspection: logs now tailing Prospection: incoming N min. logs 13年11月30日土曜日
  10. 10. Retrospection in ISUCON We MUST NOT be a slave of information. Too many is worse. We MUST know factors at least. Too few is worse. 13年11月30日土曜日
  11. 11. analyze_apache_logs Bundled with Apache::Log::Parser (in CPAN) Read logs from STDIN, and analyze it For each method/paths HTTP response status code Response duration (avg/min/max) Query Strings / Referers (option) 13年11月30日土曜日
  12. 12. $ cat /var/log/httpd/access_log | analyze_apache_logs -s path TOTAL: 1801 *! duration avg:97.33, min:76, max:110! tatus 200:3 s /! duration avg:73517.00, min:6617, max:134667! status 200:6 /entry! duration avg:168814.06, min:41780, max:378686! status 200:33 /entry/15035!duration avg:34386.00, min:34386, max:34386! status 200:1 /follow! duration avg:171574.81, min:4032, max:610354!status 200:145 /icon! duration avg:262889.95, min:117225, max:784451! status 200:21 /icon/ 03df2637e15ff22eeb825d3aa664c2ecbf399cbc0257c94db002497d508a476c! duration avg:292981.50, min:239181, max:346782! status 200:2 /icon/ 06e3640fd416acffbbc63177bf5a65b9981de8dc3aae19ca9224fcf45c6fa1f6! duration avg:270258.61, min:73933, max:492001! status 200:18 /icon/ 09228075c09882cbf065a30848e79bdc3e43f7b43273be98304a5f7712aa37d8! duration avg:198728.00, min:116202, max:271046! status 200:3 /icon/ 0ab3a5827c926a148ef28d572e44a878a99ceecc11296025319f21826b77f352! duration avg:250647.07, min:63798, max:503243! status 200:14 /icon/ 0d5f799ba92380f94f6108521aacb50280da2a731a9d5fb19d6da1f224837a4a! 13年11月30日土曜日
  13. 13. Retrospection in action Shib: Hive WebUI -> mapreduce ex: N min. logs of 10 mins ago Import lag / MapReduce lag Kibana: Elasticsearch WebUI Scalability? Fluentd + GrowthForecast without on-demand queries 13年11月30日土曜日
  14. 14. Retrospection: Fluentd+GrowthForecast HTTP Response Status HTTP Response Times (Avg, [50,90,95,98,99]%tiles) 13年11月30日土曜日
  15. 15. Inspection ImHacker by @cho45 http://subtech.g.hatena.ne.jp/cho45/20120810/1344606438 13年11月30日土曜日
  16. 16. Prospection Queries for future/incoming logs both of access logs and application logs results for 5min. logs at just 5min. later 13年11月30日土曜日
  17. 17. Norikra: Schema-less Stream Processing with SQL 13年11月30日土曜日
  18. 18. Norikra(1): Schema-less event stream: Add/Remove data fields whenever you want SQL: No more restarts to add/remove queries w/ JOINs, w/ SubQueries w/ UDF Truly Complex events: Nested Hash/Array, accessible directly from SQL 13年11月30日土曜日
  19. 19. Norikra(2): Open source software: Licensed under GPLv2 Based on Esper UDF plugins from rubygems.org Ultra-fast bootstrap & small start: 3mins to install/start 1 server 13年11月30日土曜日
  20. 20. Norikra Queries: (1) SELECT name, age FROM events 13年11月30日土曜日
  21. 21. Norikra Queries: (1) {“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Kyoto”} SELECT name, age FROM events {“name”:”tagomoris”,”age”:34} 13年11月30日土曜日
  22. 22. Norikra Queries: (1) {“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Kyoto”} SELECT name, age FROM events nothing 13年11月30日土曜日
  23. 23. Norikra Queries: (2) {“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Kyoto”} SELECT name, age FROM events WHERE current=”Kyoto” {“name”:”tagomoris”,”age”:34} 13年11月30日土曜日
  24. 24. Norikra Queries: (2) {“name”:”secondlife”, “age”:99, “address”:”Tokyo”, “corp”:”Cookpad”, “current”:”Nara”} SELECT name, age FROM events WHERE current=”Kyoto” nothing 13年11月30日土曜日
  25. 25. Norikra Queries: (3) SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age 13年11月30日土曜日
  26. 26. Norikra Queries: (3) {“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Kyoto”} SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age every 5 mins {”age”:34,”cnt”:3}, {“age”:33,”cnt”:1}, ... 13年11月30日土曜日
  27. 27. Norikra Queries: (4) {“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Kyoto”} SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age SELECT max(age) as max FROM events.win:time_batch(5 mins) every 5 mins {”age”:34,”cnt”:3}, {“age”:33,”cnt”:1}, ... {“max”:51} 13年11月30日土曜日
  28. 28. Norikra Queries: (5) {“name”:”tagomoris”, “user:{“age”:34, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”Kyoto”, “speaker”:true, “attend”:[true,true,false, ...] } SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age 13年11月30日土曜日
  29. 29. Norikra Queries: (5) {“name”:”tagomoris”, “user:{“age”:34, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”Kyoto”, “speaker”:true, “attend”:[true,true,false, ...] } SELECT user.age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY user.age 13年11月30日土曜日
  30. 30. Norikra Queries: (5) {“name”:”tagomoris”, “user:{“age”:34, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”Kyoto”, “speaker”:true, “attend”:[true,true,false, ...] } SELECT user.age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) WHERE current=”Kyoto” AND attend.$0 AND attend.$1 GROUP BY user.age 13年11月30日土曜日
  31. 31. Before: Hive EVERY HOUR! SELECT yyyymmdd, hh, campaign_id, region, lang, count(*) AS click, count(distinct member_id) AS uu FROM ( SELECT yyyymmdd, hh, get_json_object(log, '$.campaign.id') AS campaign_id, get_json_object(log, '$.member.region') AS region, get_json_object(log, '$.member.lang') AS lang, get_json_object(log, '$.member.id') AS member_id FROM applog WHERE service='myservice' AND yyyymmdd='20131101' AND hh='00' AND get_json_object(log, '$.type') = 'click' ) x GROUP BY yyyymmdd, hh, campaign_id, region, lang 13年11月30日土曜日
  32. 32. After: Norikra SELECT campaign.id AS campaign_id, member.region AS region, count(*) AS click, count(distinct member.id) AS uu FROM myservice.win:time_batch(1 hours) WHERE type="click" GROUP BY campaign.id, member.region 13年11月30日土曜日
  33. 33. Before: Fluentd EACH SERVICES <match for.target.service> type numeric_monitor unit minute tag service.response output_key_prefix request_api aggregate all monitor_key api_response_time percentiles 50,90,95,98,99 </match> ... AND RESTART OF FLUENTD!!!!!!!!!!!!!! 13年11月30日土曜日
  34. 34. After: Norikra EACH SERVICES! SELECT percentiles(api_response_time, [50,90,95,98,99]) AS p FROM target_service.win:time_batch(1 min) WITHOUT ANY RESTARTS! 13年11月30日土曜日
  35. 35. Conclusion Retrospections are important We have many methods for retrospections now Prospections are also important For complex logs For immediate reports For less system managements 13年11月30日土曜日
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×