The longest 5 minutes in our life

2,419
-1

Published on

Monitoring as retrospection/inspection/prospection

Published in: Technology, Travel
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,419
On Slideshare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
10
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

The longest 5 minutes in our life

  1. 1. The longest 5 minutes in our life. @tagomoris 2013/11/30 Monitoring Casual Talks in Kyoto 13年11月30日土曜日
  2. 2. ※タイトルは中二 13年11月30日土曜日
  3. 3. TAGOMORI Satoshi (@tagomoris) LINE Corp. Hadoop, Fluentd, Norikra, ... 13年11月30日土曜日
  4. 4. 13年11月30日土曜日
  5. 5. ISUCON勝ちました 13年11月30日土曜日
  6. 6. 石狩DC見学ツアーエヴァンジェリスト 13年11月30日土曜日
  7. 7. What 5min. is for? ISUCON Our new service launches Our services in troubles 13年11月30日土曜日
  8. 8. What we can do in 5min.? Investigate logs! Logs! Logs! Hot request paths Heavy request paths How many requests? How many users? and, and, and ... 13年11月30日土曜日
  9. 9. Logs Retrospection: past N min. logs Inspection: logs now tailing Prospection: incoming N min. logs 13年11月30日土曜日
  10. 10. Retrospection in ISUCON We MUST NOT be a slave of information. Too many is worse. We MUST know factors at least. Too few is worse. 13年11月30日土曜日
  11. 11. analyze_apache_logs Bundled with Apache::Log::Parser (in CPAN) Read logs from STDIN, and analyze it For each method/paths HTTP response status code Response duration (avg/min/max) Query Strings / Referers (option) 13年11月30日土曜日
  12. 12. $ cat /var/log/httpd/access_log | analyze_apache_logs -s path TOTAL: 1801 *! duration avg:97.33, min:76, max:110! tatus 200:3 s /! duration avg:73517.00, min:6617, max:134667! status 200:6 /entry! duration avg:168814.06, min:41780, max:378686! status 200:33 /entry/15035!duration avg:34386.00, min:34386, max:34386! status 200:1 /follow! duration avg:171574.81, min:4032, max:610354!status 200:145 /icon! duration avg:262889.95, min:117225, max:784451! status 200:21 /icon/ 03df2637e15ff22eeb825d3aa664c2ecbf399cbc0257c94db002497d508a476c! duration avg:292981.50, min:239181, max:346782! status 200:2 /icon/ 06e3640fd416acffbbc63177bf5a65b9981de8dc3aae19ca9224fcf45c6fa1f6! duration avg:270258.61, min:73933, max:492001! status 200:18 /icon/ 09228075c09882cbf065a30848e79bdc3e43f7b43273be98304a5f7712aa37d8! duration avg:198728.00, min:116202, max:271046! status 200:3 /icon/ 0ab3a5827c926a148ef28d572e44a878a99ceecc11296025319f21826b77f352! duration avg:250647.07, min:63798, max:503243! status 200:14 /icon/ 0d5f799ba92380f94f6108521aacb50280da2a731a9d5fb19d6da1f224837a4a! 13年11月30日土曜日
  13. 13. Retrospection in action Shib: Hive WebUI -> mapreduce ex: N min. logs of 10 mins ago Import lag / MapReduce lag Kibana: Elasticsearch WebUI Scalability? Fluentd + GrowthForecast without on-demand queries 13年11月30日土曜日
  14. 14. Retrospection: Fluentd+GrowthForecast HTTP Response Status HTTP Response Times (Avg, [50,90,95,98,99]%tiles) 13年11月30日土曜日
  15. 15. Inspection ImHacker by @cho45 http://subtech.g.hatena.ne.jp/cho45/20120810/1344606438 13年11月30日土曜日
  16. 16. Prospection Queries for future/incoming logs both of access logs and application logs results for 5min. logs at just 5min. later 13年11月30日土曜日
  17. 17. Norikra: Schema-less Stream Processing with SQL 13年11月30日土曜日
  18. 18. Norikra(1): Schema-less event stream: Add/Remove data fields whenever you want SQL: No more restarts to add/remove queries w/ JOINs, w/ SubQueries w/ UDF Truly Complex events: Nested Hash/Array, accessible directly from SQL 13年11月30日土曜日
  19. 19. Norikra(2): Open source software: Licensed under GPLv2 Based on Esper UDF plugins from rubygems.org Ultra-fast bootstrap & small start: 3mins to install/start 1 server 13年11月30日土曜日
  20. 20. Norikra Queries: (1) SELECT name, age FROM events 13年11月30日土曜日
  21. 21. Norikra Queries: (1) {“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Kyoto”} SELECT name, age FROM events {“name”:”tagomoris”,”age”:34} 13年11月30日土曜日
  22. 22. Norikra Queries: (1) {“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Kyoto”} SELECT name, age FROM events nothing 13年11月30日土曜日
  23. 23. Norikra Queries: (2) {“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Kyoto”} SELECT name, age FROM events WHERE current=”Kyoto” {“name”:”tagomoris”,”age”:34} 13年11月30日土曜日
  24. 24. Norikra Queries: (2) {“name”:”secondlife”, “age”:99, “address”:”Tokyo”, “corp”:”Cookpad”, “current”:”Nara”} SELECT name, age FROM events WHERE current=”Kyoto” nothing 13年11月30日土曜日
  25. 25. Norikra Queries: (3) SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age 13年11月30日土曜日
  26. 26. Norikra Queries: (3) {“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Kyoto”} SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age every 5 mins {”age”:34,”cnt”:3}, {“age”:33,”cnt”:1}, ... 13年11月30日土曜日
  27. 27. Norikra Queries: (4) {“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Kyoto”} SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age SELECT max(age) as max FROM events.win:time_batch(5 mins) every 5 mins {”age”:34,”cnt”:3}, {“age”:33,”cnt”:1}, ... {“max”:51} 13年11月30日土曜日
  28. 28. Norikra Queries: (5) {“name”:”tagomoris”, “user:{“age”:34, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”Kyoto”, “speaker”:true, “attend”:[true,true,false, ...] } SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age 13年11月30日土曜日
  29. 29. Norikra Queries: (5) {“name”:”tagomoris”, “user:{“age”:34, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”Kyoto”, “speaker”:true, “attend”:[true,true,false, ...] } SELECT user.age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY user.age 13年11月30日土曜日
  30. 30. Norikra Queries: (5) {“name”:”tagomoris”, “user:{“age”:34, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”Kyoto”, “speaker”:true, “attend”:[true,true,false, ...] } SELECT user.age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) WHERE current=”Kyoto” AND attend.$0 AND attend.$1 GROUP BY user.age 13年11月30日土曜日
  31. 31. Before: Hive EVERY HOUR! SELECT yyyymmdd, hh, campaign_id, region, lang, count(*) AS click, count(distinct member_id) AS uu FROM ( SELECT yyyymmdd, hh, get_json_object(log, '$.campaign.id') AS campaign_id, get_json_object(log, '$.member.region') AS region, get_json_object(log, '$.member.lang') AS lang, get_json_object(log, '$.member.id') AS member_id FROM applog WHERE service='myservice' AND yyyymmdd='20131101' AND hh='00' AND get_json_object(log, '$.type') = 'click' ) x GROUP BY yyyymmdd, hh, campaign_id, region, lang 13年11月30日土曜日
  32. 32. After: Norikra SELECT campaign.id AS campaign_id, member.region AS region, count(*) AS click, count(distinct member.id) AS uu FROM myservice.win:time_batch(1 hours) WHERE type="click" GROUP BY campaign.id, member.region 13年11月30日土曜日
  33. 33. Before: Fluentd EACH SERVICES <match for.target.service> type numeric_monitor unit minute tag service.response output_key_prefix request_api aggregate all monitor_key api_response_time percentiles 50,90,95,98,99 </match> ... AND RESTART OF FLUENTD!!!!!!!!!!!!!! 13年11月30日土曜日
  34. 34. After: Norikra EACH SERVICES! SELECT percentiles(api_response_time, [50,90,95,98,99]) AS p FROM target_service.win:time_batch(1 min) WITHOUT ANY RESTARTS! 13年11月30日土曜日
  35. 35. Conclusion Retrospections are important We have many methods for retrospections now Prospections are also important For complex logs For immediate reports For less system managements 13年11月30日土曜日
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×