12. What we want to do
COUNT PV,UU and others (daily/realtime)
COUNT Service metrics (daily/hourly/realtime)
FIND Surprising Errors [4xx,5xx] (immediately)
CHECK Response Times (immediately)
SERCH Logs in troubles (hourly/immediately)
VISUALIZE/NOTIFY App Status (realtime)
13年11月7日木曜日
13. Batches and Streams
Hadoop is for batches
High performance batch is important
HDFS has good performance
Stream log writing and calculations
are also VERY VERY IMPORTANT
Hybrid System:
Stream processing + Batch
13年11月7日木曜日
15. Data analytics players
PROGRAMMER
Raw Log Formats
Application Logs
Data Sizes
Data Semantics
SERVICE DIRECTOR
SALES
Whatever Metrics They Want
Storages
Hadoop Cluster
Visualization Tools
ADMINISTRATOR
........
BOARD MEMBER
13年11月7日木曜日
16. Data analytics players
PROGRAMMER
Raw Log Formats
Application Logs
Data Sizes
Data Semantics
SERVICE DIRECTOR
SALES
WE NEED THE QUERY LANGUAGE
Whatever Metrics They Want
WHAT THEY ALL CAN
RUN AND UNDERSTAND!!!!!!!!!!
Storages
Hadoop Cluster
Visualization Tools
ADMINISTRATOR
........
BOARD MEMBER
13年11月7日木曜日
24. Stream processing
Queries for fixed Window
every 1hour, 10minutes, 1minutes, ...
latest 10evens, ...
all events
Once query registered, Runs forever
Results appear automatically
NO MORE STORAGES
13年11月7日木曜日
27. Norikra(1):
Schema-less event stream:
Add/Remove data fields whenever you want
SQL:
No more restarts to add/remove queries
w/ JOINs, w/ SubQueries
w/UDF
Truly Complex events:
Nested Hash/Array, accessible directly from SQL
13年11月7日木曜日
28. Norikra(2):
Open source software:
Licensed under GPLv2
Based on Esper
UDF plugins from rubygems.org
Ultra-fast bootstrap & small start:
3mins to install/start
1 server
13年11月7日木曜日
34. Norikra Queries: (3)
SELECT age, COUNT(*) as cnt
FROM events.win:time_batch(5 mins)
GROUP BY age
13年11月7日木曜日
35. Norikra Queries: (3)
{“name”:”tagomoris”,
“age”:34, “address”:”Tokyo”,
“corp”:”LINE”, “current”:”Meguro”}
SELECT age, COUNT(*) as cnt
FROM events.win:time_batch(5 mins)
GROUP BY age
every 5 mins
{”age”:34,”cnt”:3}, {“age”:33,”cnt”:1}, ...
13年11月7日木曜日
36. Norikra Queries: (4)
{“name”:”tagomoris”,
“age”:34, “address”:”Tokyo”,
“corp”:”LINE”, “current”:”Meguro”}
SELECT age, COUNT(*) as cnt
FROM
events.win:time_batch(5 mins)
GROUP BY age
SELECT max(age) as max
FROM
events.win:time_batch(5 mins)
every 5 mins
{”age”:34,”cnt”:3}, {“age”:33,”cnt”:1}, ...
{“max”:51}
13年11月7日木曜日
37. Norikra Queries: (5)
{“name”:”tagomoris”,
“user:{“age”:34, “corp”:”LINE”,
“address”:”Tokyo”},
“current”:”Meguro”,
“speaker”:true,
“attend”:[true,true,false, ...]
}
SELECT age, COUNT(*) as cnt
FROM events.win:time_batch(5 mins)
GROUP BY age
13年11月7日木曜日
38. Norikra Queries: (5)
{“name”:”tagomoris”,
“user:{“age”:34, “corp”:”LINE”,
“address”:”Tokyo”},
“current”:”Meguro”,
“speaker”:true,
“attend”:[true,true,false, ...]
}
SELECT user.age, COUNT(*) as cnt
FROM events.win:time_batch(5 mins)
GROUP BY user.age
13年11月7日木曜日
39. Norikra Queries: (5)
{“name”:”tagomoris”,
“user:{“age”:34, “corp”:”LINE”,
“address”:”Tokyo”},
“current”:”Meguro”,
“speaker”:true,
“attend”:[true,true,false, ...]
}
SELECT user.age, COUNT(*) as cnt
FROM events.win:time_batch(5 mins)
WHERE current=”Meguro” AND attend.$0 AND attend.$1
GROUP BY user.age
13年11月7日木曜日
40. Before: Hive
EVERY HOUR!
SELECT
yyyymmdd, hh, campaign_id, region, lang,
count(*) AS click,
count(distinct member_id) AS uu
FROM (
SELECT
yyyymmdd,
hh,
get_json_object(log, '$.campaign.id') AS campaign_id,
get_json_object(log, '$.member.region') AS region,
get_json_object(log, '$.member.lang') AS lang,
get_json_object(log, '$.member.id') AS member_id
FROM applog
WHERE service='myservice'
AND yyyymmdd='20131101' AND hh='00'
AND get_json_object(log, '$.type') = 'click'
) x
GROUP BY yyyymmdd, hh, campaign_id, region, lang
13年11月7日木曜日
41. After: Norikra
SELECT
campaign.id AS campaign_id, member.region AS region,
count(*) AS click,
count(distinct member.id) AS uu
FROM myservice.win:time_batch(1 hours)
WHERE type="click"
GROUP BY campaign.id, member.region
13年11月7日木曜日
42. Norikra: Current Status
v0.1.0: Released at 2013/11/01
by tagomoris
http://norikra.github.io/
Documents: under development
Just started to use in production
13年11月7日木曜日