1. Hive Tools in NHN Japan
Hadoop Source Code Reading Vol.9
2012/05/30
@tagomoris (TAGOMORI Satoshi)
12年5月30日水曜日
2. @tagomoris
NHN Japan Corp
Web Service Division
12年5月30日水曜日
3. Hive in NHN Japan
Reporting of access log (not analysis)
Pageviews and/or Unique Users?
Accesses under specified condition?
Hey, what numbers of accesses for our new
features?
new bot accesses? any troubles?
12年5月30日水曜日
4. SELECT yyyymmdd,
count(is_pc(pa)) as pc,
count(is_smartphone(pa)) as smartphone,
count(is_mobilephone(pa)) as mobilephone
FROM (
SELECT yyyymmdd, parse_agent(agent) as pa
FROM access_log
WHERE service='__SERVICE__'
AND (yyyymmdd='__1DAYS_AGO__'
OR yyyymmdd='__2DAYS_AGO__')
AND NOT flag
) x
GROUP BY yyyymmdd
ORDER BY yyyymmdd LIMIT 2
12年5月30日水曜日
6. Today's topic
For Fluentd,
See 'Software Design'
2012/06
12年5月30日水曜日
7. backup stream
stream
Fluentd
Cluster
realtime monitoring
stream
Hoop Server (HttpFs)
Hadoop / HDFS
Hive Server
Shib
(Hive Client Web Application)
Users
(Web Browser)
ShibUI
(Query Management System)
12年5月30日水曜日
8. Why Hive?
Handmade MapReduce: Noooooooooooooooo
Pig? Hive?
All we loves 'xQL' like 'SQL'...
FORCE to throw away all queries
"処理を書き捨てる勇気"
We are likely to maintain 'programs' (like pig script)
With chainging data, BAD to maintain how to handle
data
12年5月30日水曜日
9. Client Tools?
'hive' command sucks
Hue (Beeswax for Hive)?
we want end-users to run 'SELECT' only.
we want HTTP API to work with another systems
Periodic query execution, and graph plotting
Miscellaneous extensions we want (and ease to write)
12年5月30日水曜日
11. backup stream
stream
Fluentd
Cluster
realtime monitoring
stream
Hoop Server (HttpFs)
Hadoop / HDFS
Hive Server
Shib
(Hive Client Web Application)
Users
(Web Browser)
ShibUI
(Query Management System)
12年5月30日水曜日
12. backup stream
stream
Fluentd
Cluster
realtime monitoring
stream
Hoop Server (HttpFs)
Hadoop / HDFS
Hive Server
Shib
(Hive Client Web Application)
Users
(Web Browser)
ShibUI
(Query Management System)
12年5月30日水曜日
13. Shib
https://github.com/tagomoris/shib
Hive Client Web Application
Run 'SELECT' queries only
Store results of queries
Provides HTTP API:
to run queries
to get result data of queries
12年5月30日水曜日
14. backup stream
stream
Fluentd
Cluster
realtime monitoring
stream
Hoop Server (HttpFs)
Hadoop / HDFS
Hive Server
Shib
(Hive Client Web Application)
Users
(Web Browser)
ShibUI
(Query Management System)
12年5月30日水曜日
17. ShibUI
(non-disclosured application)
Web Front-end of Shib
Daily/Weekly/Monthly Query Management System
Graph plotting of query results
Record log to check queries no one views...
Query Builder (for hive-unfriendly engineers/directors)
(Under construction)
12年5月30日水曜日
18. backup stream
stream
Fluentd
Cluster
realtime monitoring
stream
Hoop Server (HttpFs)
Hadoop / HDFS
Hive Server
Shib
(Hive Client Web Application)
Users
(Web Browser)
ShibUI
(Query Management System)
12年5月30日水曜日
19. backup stream
stream
Fluentd
Cluster
realtime monitoring
stream
Hoop Server (HttpFs)
Hadoop / HDFS
Hive Server
Shib
(Hive Client Web Application)
Users
(Web Browser)
ShibUI
(Query Management System)
12年5月30日水曜日
20. Hadoop / HDFS
Hive Server
Shib (node.js)
HTTP
ShibUI
HTTP/Ajax (Perl/Plack Web Application: Kossy)
Users
(Web Browser)
HRForecast MySQL
12年5月30日水曜日
24. What to do next
MapReduce Job management
check query to run correctly
kill queries
Huahin Manager by @ryu_kobayashi
Hadoop MapReduce Job Manager over HTTP
http://huahin.github.com/huahin-manager/
Shib version up
node.js 0.4 based -> 0.6 based
12年5月30日水曜日