Hive Tools in NHN Japan #hadoopreading

4,481 views
4,324 views

Published on

Published in: Technology
0 Comments
12 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,481
On SlideShare
0
From Embeds
0
Number of Embeds
1,348
Actions
Shares
0
Downloads
35
Comments
0
Likes
12
Embeds 0
No embeds

No notes for slide

Hive Tools in NHN Japan #hadoopreading

  1. 1. Hive Tools in NHN Japan Hadoop Source Code Reading Vol.9 2012/05/30 @tagomoris (TAGOMORI Satoshi)12年5月30日水曜日
  2. 2. @tagomoris NHN Japan Corp Web Service Division12年5月30日水曜日
  3. 3. Hive in NHN Japan Reporting of access log (not analysis) Pageviews and/or Unique Users? Accesses under specified condition? Hey, what numbers of accesses for our new features? new bot accesses? any troubles?12年5月30日水曜日
  4. 4. SELECT yyyymmdd, count(is_pc(pa)) as pc, count(is_smartphone(pa)) as smartphone, count(is_mobilephone(pa)) as mobilephone FROM ( SELECT yyyymmdd, parse_agent(agent) as pa FROM access_log WHERE service=__SERVICE__ AND (yyyymmdd=__1DAYS_AGO__ OR yyyymmdd=__2DAYS_AGO__) AND NOT flag ) x GROUP BY yyyymmdd ORDER BY yyyymmdd LIMIT 212年5月30日水曜日
  5. 5. 12年5月30日水曜日
  6. 6. Todays topic For Fluentd, See Software Design 2012/0612年5月30日水曜日
  7. 7. backup stream stream Fluentd Cluster realtime monitoring stream Hoop Server (HttpFs) Hadoop / HDFS Hive Server Shib (Hive Client Web Application) Users (Web Browser) ShibUI (Query Management System)12年5月30日水曜日
  8. 8. Why Hive? Handmade MapReduce: Noooooooooooooooo Pig? Hive? All we loves xQL like SQL... FORCE to throw away all queries "処理を書き捨てる勇気" We are likely to maintain programs (like pig script) With chainging data, BAD to maintain how to handle data12年5月30日水曜日
  9. 9. Client Tools? hive command sucks Hue (Beeswax for Hive)? we want end-users to run SELECT only. we want HTTP API to work with another systems Periodic query execution, and graph plotting Miscellaneous extensions we want (and ease to write)12年5月30日水曜日
  10. 10. Copy&Paste Based Query Management Non-refered Queries MUST DIE12年5月30日水曜日
  11. 11. backup stream stream Fluentd Cluster realtime monitoring stream Hoop Server (HttpFs) Hadoop / HDFS Hive Server Shib (Hive Client Web Application) Users (Web Browser) ShibUI (Query Management System)12年5月30日水曜日
  12. 12. backup stream stream Fluentd Cluster realtime monitoring stream Hoop Server (HttpFs) Hadoop / HDFS Hive Server Shib (Hive Client Web Application) Users (Web Browser) ShibUI (Query Management System)12年5月30日水曜日
  13. 13. Shib https://github.com/tagomoris/shib Hive Client Web Application Run SELECT queries only Store results of queries Provides HTTP API: to run queries to get result data of queries12年5月30日水曜日
  14. 14. backup stream stream Fluentd Cluster realtime monitoring stream Hoop Server (HttpFs) Hadoop / HDFS Hive Server Shib (Hive Client Web Application) Users (Web Browser) ShibUI (Query Management System)12年5月30日水曜日
  15. 15. Hadoop / HDFS Hive Server Thrift Shib (node.js) HTTP/Ajax Users DataStore (Web Browser) (Kyoto Tycoon)12年5月30日水曜日
  16. 16. 12年5月30日水曜日
  17. 17. ShibUI (non-disclosured application) Web Front-end of Shib Daily/Weekly/Monthly Query Management System Graph plotting of query results Record log to check queries no one views... Query Builder (for hive-unfriendly engineers/directors) (Under construction)12年5月30日水曜日
  18. 18. backup stream stream Fluentd Cluster realtime monitoring stream Hoop Server (HttpFs) Hadoop / HDFS Hive Server Shib (Hive Client Web Application) Users (Web Browser) ShibUI (Query Management System)12年5月30日水曜日
  19. 19. backup stream stream Fluentd Cluster realtime monitoring stream Hoop Server (HttpFs) Hadoop / HDFS Hive Server Shib (Hive Client Web Application) Users (Web Browser) ShibUI (Query Management System)12年5月30日水曜日
  20. 20. Hadoop / HDFS Hive Server Shib (node.js) HTTP ShibUI HTTP/Ajax (Perl/Plack Web Application: Kossy) Users (Web Browser) HRForecast MySQL12年5月30日水曜日
  21. 21. 12年5月30日水曜日
  22. 22. 12年5月30日水曜日
  23. 23. 12年5月30日水曜日
  24. 24. What to do next MapReduce Job management check query to run correctly kill queries Huahin Manager by @ryu_kobayashi Hadoop MapReduce Job Manager over HTTP http://huahin.github.com/huahin-manager/ Shib version up node.js 0.4 based -> 0.6 based12年5月30日水曜日
  25. 25. Questions?12年5月30日水曜日
  26. 26. Thanks!12年5月30日水曜日

×