Log Analysis System And its designs in LINE Corp. 2014 early

7,533 views

Published on

LINE developer meetup in fukuoka 1 #LINE_DM

Published in: Technology

Log Analysis System And its designs in LINE Corp. 2014 early

  1. 1. Log Analysis Systems And its designs In LINE Corp. 2014 Early 2014/02/20 (Thu) @tagomoris (TAGOMORI Satoshi) LINE Corp. LINE Developer Meetup in Fukuoka #1 14年2月20日木曜日
  2. 2. TAGOMORI Satoshi (@tagomoris) LINE Corp. Development Support Team 14年2月20日木曜日
  3. 3. 14年2月20日木曜日
  4. 4. 14年2月20日木曜日
  5. 5. Data Collecting, Aggregation, Analytics, Visualization 14年2月20日木曜日
  6. 6. See also: 「OSSで支えられるライブドアの巨大ログ集計」 (2012 Summer) http://www.slideshare.net/tagomoris/oss-nhntech 「Log analysis system with Hadoop in livedoor 2013 Winter」(2013 early) http://www.slideshare.net/tagomoris/log-analysis-with-hadoop-in-livedoor-2013 「Batch and Stream processing with SQL」 (2013 Fall) http://www.slideshare.net/tagomoris/batch-and-stream-processing-with-sql 14年2月20日木曜日
  7. 7. disclaimer: This talk is about “a” log analysis system in LINE. 14年2月20日木曜日
  8. 8. SQL好きですか? 14年2月20日木曜日
  9. 9. System Overview (2014) Archive Storage (scribed) Fluentd Cluster Web Servers STREAM Notifications (IRC) Fluentd Watchers Graph Tools Norikra webhdfs Hadoop Cluster (HDFS, MR) hive server Huahin Manager Presto Cluster 14年2月20日木曜日 BATCH SCHEDULED BATCH Shib ShibUI
  10. 10. System Overview (2014) Ruby Fluentd Cluster Web Servers STREAM Archive Storage (scribed) Notifications (IRC) Fluentd Watchers Graph Tools Norikra Java webhdfs Hadoop Cluster (HDFS, MR) hive server Huahin Manager Presto Cluster 14年2月20日木曜日 SCHEDULED BATCH NodeBATCH Perl Shib ShibUI
  11. 11. System Overview (2014) Archive Storage (scribed) Fluentd Cluster Web Servers fluentd.conf STREAM Notifications (IRC) Fluentd Watchers Graph Tools Norikra webhdfs Hadoop Cluster (HDFS, MR) SQL hive server Huahin Manager Presto Cluster 14年2月20日木曜日 BATCH SCHEDULED BATCH Shib ShibUI
  12. 12. Who uses it? Internet Messaging Service Public Web Service Game Private Web Service (for closed person-to-persons) Internal Web Service (administrator only) Data Analytics Service 14年2月20日木曜日
  13. 13. Who uses it? Internet Messaging Service Public Web Service Game Private Web Service (for closed person-to-persons) Internal Web Service (administrator only) Data Analytics Service 14年2月20日木曜日
  14. 14. Data analytics players PROGRAMMER Raw Log Formats Application Logs Data Sizes Data Semantics SERVICE DIRECTOR SALES Whatever Metrics They Want Storages Hadoop Cluster Visualization Tools ADMINISTRATOR ........ BOARD MEMBER 14年2月20日木曜日
  15. 15. Data analytics players PROGRAMMER Raw Log Formats Application Logs Data Sizes Data Semantics SERVICE DIRECTOR SALES WE NEED THE QUERY LANGUAGE Whatever Metrics They Want WHAT THEY ALL CAN RUN AND UNDERSTAND!!!!!!!!!! Storages Hadoop Cluster Visualization Tools ADMINISTRATOR ........ BOARD MEMBER 14年2月20日木曜日
  16. 16. Archive Storage (scribed) Fluentd Cluster Web Servers STREAM Notifications (IRC) Fluentd Watchers Graph Tools webhdfs Hadoop Cluster (HDFS, MR) 14年2月20日木曜日 hive server Huahin Manager BATCH SCHEDULED BATCH Shib ShibUI
  17. 17. Archive Storage (scribed) Fluentd Cluster Web Servers STREAM Notifications (IRC) Fluentd Watchers Graph Tools Norikra webhdfs Hadoop Cluster (HDFS, MR) hive server Huahin Manager Presto Cluster 14年2月20日木曜日 BATCH SCHEDULED BATCH Shib ShibUI
  18. 18. 14年2月20日木曜日
  19. 19. SQL: Hive 14年2月20日木曜日
  20. 20. SQL: Hive 14年2月20日木曜日
  21. 21. Norikra Schema-less Stream Processing with SQL 14年2月20日木曜日
  22. 22. 14年2月20日木曜日
  23. 23. Software Stack Hadoop: CDH 4.5.0 w/ JDK6 (WebHDFS, Hive, HiveServer) Presto: 0.59 w/ JDK7 Shib: v0.3.0 w/ Node.js v0.10 Fluentd: v0.10.39 w/ Ruby 2.0.0 And many plugins Norikra: v0.1.3 w/ JRuby 1.7.4 14年2月20日木曜日
  24. 24. 14年2月20日木曜日
  25. 25. Batches and Streams Hadoop is for batches High performance batch is important HDFS has good performance Stream log writing and calcurations are also VERY VERY IMPORTANT Hybrid System: Stream processing + Batch 14年2月20日木曜日
  26. 26. Collect and deliver as STREAM 14年2月20日木曜日 Calculate as BATCH
  27. 27. 1st gen: First impl. Web Servers Scribed STREAM (LIBHDFS) Hadoop Cluster CDH3b2 (Hadoop Streaming) 14年2月20日木曜日 hive server BATCH Shib Archive Storage (scribed)
  28. 28. Hadoop and Hive Filesystem (HDFS) Processing Framework (Hadoop MapReduce) Query Compiler: SQL -> MR (Hive) Thrift API Server (HiveServer) Old style Java (....) 14年2月20日木曜日
  29. 29. Shib WebUI Client for Hive Query editor/executer + result viewer HTTP JSON API Gateway for Hive query execution Node.js 14年2月20日木曜日
  30. 30. 2nd gen: +Fluentd Web Servers Archive Storage (scribed) Fluentd Cluster STREAM Cludera Hoop Hadoop Cluster CDH3u2 (Hive) 14年2月20日木曜日 hive server Huahin Manager BATCH Shib
  31. 31. Fluentd Log collector Apache-like configuration Pluggable Input/Output/Buffer on public plugin repository (rubygems.org) Ruby 1.9 or later Collect, and Store collect: fluent-agent-lite (perl) store: fluent-plugin-webhdfs 14年2月20日木曜日
  32. 32. Collect and deliver as STREAM Calculate as BATCH Monitor as STREAM 14年2月20日木曜日
  33. 33. 3rd gen: +Monitoring Web Servers Archive Storage (scribed) Fluentd Cluster STREAM Fluentd Watchers webhdfs Hadoop Cluster CDH3u5 (Hive) 14年2月20日木曜日 Notifications (IRC) hive server Huahin Manager BATCH Graph Tools SCHEDULED BATCH Shib ShibUI
  34. 34. Fluentd plugins Monitoring in real-time message num/size counting min, max, average and percentiles Visualization and Notification Graph tools (GrowthForecast / Focuslight) IRC (or Mail, HipChat, ...) 14年2月20日木曜日
  35. 35. 4th gen: +HA (hadoop) Web Servers Archive Storage (scribed) Fluentd Cluster STREAM Fluentd Watchers webhdfs Hadoop Cluster CDH4 (HDFS, YARN) 14年2月20日木曜日 Notifications (IRC) hive server Huahin Manager BATCH Graph Tools SCHEDULED BATCH Shib ShibUI
  36. 36. Calculate as STREAM on demand Collect and deliver as STREAM Calculate as BATCH Monitor as STREAM 14年2月20日木曜日
  37. 37. 5th gen: +Norikra Archive Storage (scribed) Fluentd Cluster Web Servers STREAM Notifications (IRC) Fluentd Watchers Graph Tools Norikra webhdfs Hadoop Cluster (HDFS, MR) 14年2月20日木曜日 hive server Huahin Manager BATCH SCHEDULED BATCH Shib ShibUI
  38. 38. Norikra SQL Query for Streams Add/Remove on demand (without restarts) ... and many features HTTP JSON API JRuby on JVM with Esper 14年2月20日木曜日
  39. 39. Norikra Queries: (1) {“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Fukuoka”} SELECT name, age FROM events WHERE current=”Fukuoka” {“name”:”tagomoris”,”age”:34} 14年2月20日木曜日
  40. 40. Norikra Queries: (2) {“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Fukuoka”} SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) WHERE current=”Fukuoka” GROUP BY age every 5 mins {”age”:34,”cnt”:3}, {“age”:33,”cnt”:1}, ... 14年2月20日木曜日
  41. 41. Calculate as STREAM on demand Collect and deliver as STREAM Calculate as BATCH Monitor as STREAM 14年2月20日木曜日 Calculate as BATCH immediately on demand
  42. 42. 5th gen: +Presto Archive Storage (scribed) Fluentd Cluster Web Servers STREAM Notifications (IRC) Fluentd Watchers Graph Tools Norikra webhdfs Hadoop Cluster (HDFS, MR) hive server Huahin Manager Presto Cluster 14年2月20日木曜日 BATCH SCHEDULED BATCH Shib ShibUI
  43. 43. Presto Open sourced by Facebook at 2013/11/07 MPP Engine: Massive Parallel Processing Engine like Google BigQuery(Dremel), Cloudera Impala short latency queries (It’s not main usage of Hive) SQL HTTP JSON API Java 7 ! 14年2月20日木曜日
  44. 44. Shib v0.3.0: presto support HiveServer User (browser) THRIFT HiveServer2 Shib Analysis Batches HTTP JSON API THRIFT HTTP JSON API Presto Service Admin Tools 14年2月20日木曜日
  45. 45. Non-monolithic architecture Many subsystems for many purposes Add/Update/Replace per subsystems High interoperability by RPC-based connections Gateway can hide backend implementations 14年2月20日木曜日
  46. 46. WHAT TO DO IS NOT WHAT WE WANT TO BUT WHAT WE ARE WANTED TO. 14年2月20日木曜日
  47. 47. THERE ARE MANY OF WHAT TO DO! THANKS! 14年2月20日木曜日
  48. 48. Software list: http://fluentd.org/ http://prestodb.io/ http://norikra.github.io/ https://github.com/tagomoris/shib 14年2月20日木曜日

×