The longest 5 minutes
in our life.
@tagomoris
2013/11/30 Monitoring Casual Talks in Kyoto

13年11月30日土曜日
※タイトルは中二
13年11月30日土曜日
TAGOMORI Satoshi (@tagomoris)
LINE Corp.
Hadoop, Fluentd, Norikra, ...

13年11月30日土曜日
13年11月30日土曜日
ISUCON勝ちました

13年11月30日土曜日
石狩DC見学ツアーエヴァンジェリスト

13年11月30日土曜日
What 5min. is for?
ISUCON
Our new service launches
Our services in troubles

13年11月30日土曜日
What we can do in
5min.?
Investigate logs! Logs! Logs!
Hot request paths
Heavy request paths
How many requests? How many users?
and, and, and ...

13年11月30日土曜日
Logs
Retrospection: past N min. logs
Inspection: logs now tailing
Prospection: incoming N min. logs

13年11月30日土曜日
Retrospection
in ISUCON
We MUST NOT be a slave of information.
Too many is worse.
We MUST know factors at least.
Too few is worse.

13年11月30日土曜日
analyze_apache_logs
Bundled with Apache::Log::Parser (in CPAN)
Read logs from STDIN, and analyze it
For each method/paths
HTTP response status code
Response duration (avg/min/max)
Query Strings / Referers (option)

13年11月30日土曜日
$ cat /var/log/httpd/access_log | analyze_apache_logs -s path
TOTAL: 1801
*! duration avg:97.33, min:76, max:110! tatus 200:3
s
/! duration avg:73517.00, min:6617, max:134667! status 200:6
/entry!
duration avg:168814.06, min:41780, max:378686! status 200:33
/entry/15035!duration avg:34386.00, min:34386, max:34386! status
200:1
/follow! duration avg:171574.81, min:4032, max:610354!status
200:145
/icon! duration avg:262889.95, min:117225, max:784451! status 200:21
/icon/
03df2637e15ff22eeb825d3aa664c2ecbf399cbc0257c94db002497d508a476c!
duration avg:292981.50, min:239181, max:346782! status 200:2
/icon/
06e3640fd416acffbbc63177bf5a65b9981de8dc3aae19ca9224fcf45c6fa1f6!
duration avg:270258.61, min:73933, max:492001! status 200:18
/icon/
09228075c09882cbf065a30848e79bdc3e43f7b43273be98304a5f7712aa37d8!
duration avg:198728.00, min:116202, max:271046! status 200:3
/icon/
0ab3a5827c926a148ef28d572e44a878a99ceecc11296025319f21826b77f352!
duration avg:250647.07, min:63798, max:503243! status 200:14
/icon/
0d5f799ba92380f94f6108521aacb50280da2a731a9d5fb19d6da1f224837a4a!
13年11月30日土曜日
Retrospection
in action
Shib: Hive WebUI -> mapreduce
ex: N min. logs of 10 mins ago
Import lag / MapReduce lag
Kibana: Elasticsearch WebUI
Scalability?
Fluentd + GrowthForecast
without on-demand queries
13年11月30日土曜日
Retrospection:
Fluentd+GrowthForecast
HTTP Response Status

HTTP Response Times (Avg, [50,90,95,98,99]%tiles)

13年11月30日土曜日
Inspection
ImHacker by @cho45
http://subtech.g.hatena.ne.jp/cho45/20120810/1344606438

13年11月30日土曜日
Prospection
Queries for future/incoming logs
both of access logs and application logs
results for 5min. logs at just 5min. later

13年11月30日土曜日
Norikra:
Schema-less Stream
Processing with SQL
13年11月30日土曜日
Norikra(1):
Schema-less event stream:
Add/Remove data fields whenever you want

SQL:
No more restarts to add/remove queries
w/ JOINs, w/ SubQueries
w/ UDF

Truly Complex events:
Nested Hash/Array, accessible directly from SQL
13年11月30日土曜日
Norikra(2):
Open source software:
Licensed under GPLv2
Based on Esper
UDF plugins from rubygems.org

Ultra-fast bootstrap & small start:
3mins to install/start
1 server

13年11月30日土曜日
Norikra Queries: (1)

SELECT name, age
FROM events

13年11月30日土曜日
Norikra Queries: (1)
{“name”:”tagomoris”,
“age”:34, “address”:”Tokyo”,
“corp”:”LINE”, “current”:”Kyoto”}

SELECT name, age
FROM events

{“name”:”tagomoris”,”age”:34}
13年11月30日土曜日
Norikra Queries: (1)
{“name”:”tagomoris”,
“address”:”Tokyo”,
“corp”:”LINE”, “current”:”Kyoto”}

SELECT name, age
FROM events

nothing

13年11月30日土曜日
Norikra Queries: (2)
{“name”:”tagomoris”,
“age”:34, “address”:”Tokyo”,
“corp”:”LINE”, “current”:”Kyoto”}

SELECT name, age
FROM events
WHERE current=”Kyoto”

{“name”:”tagomoris”,”age”:34}
13年11月30日土曜日
Norikra Queries: (2)
{“name”:”secondlife”,
“age”:99, “address”:”Tokyo”,
“corp”:”Cookpad”, “current”:”Nara”}

SELECT name, age
FROM events
WHERE current=”Kyoto”

nothing

13年11月30日土曜日
Norikra Queries: (3)

SELECT age, COUNT(*) as cnt
FROM events.win:time_batch(5 mins)
GROUP BY age

13年11月30日土曜日
Norikra Queries: (3)
{“name”:”tagomoris”,
“age”:34, “address”:”Tokyo”,
“corp”:”LINE”, “current”:”Kyoto”}

SELECT age, COUNT(*) as cnt
FROM events.win:time_batch(5 mins)
GROUP BY age

every 5 mins
{”age”:34,”cnt”:3}, {“age”:33,”cnt”:1}, ...
13年11月30日土曜日
Norikra Queries: (4)
{“name”:”tagomoris”,
“age”:34, “address”:”Tokyo”,
“corp”:”LINE”, “current”:”Kyoto”}

SELECT age, COUNT(*) as cnt
FROM
events.win:time_batch(5 mins)
GROUP BY age

SELECT max(age) as max
FROM
events.win:time_batch(5 mins)

every 5 mins
{”age”:34,”cnt”:3}, {“age”:33,”cnt”:1}, ...
{“max”:51}
13年11月30日土曜日
Norikra Queries: (5)
{“name”:”tagomoris”,
“user:{“age”:34, “corp”:”LINE”,
“address”:”Tokyo”},
“current”:”Kyoto”,
“speaker”:true,
“attend”:[true,true,false, ...]
}

SELECT age, COUNT(*) as cnt
FROM events.win:time_batch(5 mins)
GROUP BY age

13年11月30日土曜日
Norikra Queries: (5)
{“name”:”tagomoris”,
“user:{“age”:34, “corp”:”LINE”,
“address”:”Tokyo”},
“current”:”Kyoto”,
“speaker”:true,
“attend”:[true,true,false, ...]
}

SELECT user.age, COUNT(*) as cnt
FROM events.win:time_batch(5 mins)
GROUP BY user.age

13年11月30日土曜日
Norikra Queries: (5)
{“name”:”tagomoris”,
“user:{“age”:34, “corp”:”LINE”,
“address”:”Tokyo”},
“current”:”Kyoto”,
“speaker”:true,
“attend”:[true,true,false, ...]
}

SELECT user.age, COUNT(*) as cnt
FROM events.win:time_batch(5 mins)
WHERE current=”Kyoto” AND attend.$0 AND attend.$1
GROUP BY user.age

13年11月30日土曜日
Before: Hive

EVERY HOUR!

SELECT
yyyymmdd, hh, campaign_id, region, lang,
count(*) AS click,
count(distinct member_id) AS uu
FROM (
SELECT
yyyymmdd,
hh,
get_json_object(log, '$.campaign.id') AS campaign_id,
get_json_object(log, '$.member.region') AS region,
get_json_object(log, '$.member.lang') AS lang,
get_json_object(log, '$.member.id') AS member_id
FROM applog
WHERE service='myservice'
AND yyyymmdd='20131101' AND hh='00'
AND get_json_object(log, '$.type') = 'click'
) x
GROUP BY yyyymmdd, hh, campaign_id, region, lang
13年11月30日土曜日
After: Norikra
SELECT
campaign.id AS campaign_id, member.region AS region,
count(*) AS click,
count(distinct member.id) AS uu
FROM myservice.win:time_batch(1 hours)
WHERE type="click"
GROUP BY campaign.id, member.region

13年11月30日土曜日
Before: Fluentd

EACH SERVICES

<match for.target.service>
type numeric_monitor
unit minute
tag service.response
output_key_prefix request_api
aggregate all
monitor_key api_response_time
percentiles 50,90,95,98,99
</match>

... AND RESTART OF FLUENTD!!!!!!!!!!!!!!

13年11月30日土曜日
After: Norikra

EACH SERVICES!

SELECT
percentiles(api_response_time, [50,90,95,98,99]) AS p
FROM target_service.win:time_batch(1 min)

WITHOUT ANY RESTARTS!

13年11月30日土曜日
Conclusion
Retrospections are important
We have many methods for retrospections now
Prospections are also important
For complex logs
For immediate reports
For less system managements

13年11月30日土曜日

The longest 5 minutes in our life