• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Log Analysis System And its designs in LINE Corp. 2014 early
 

Log Analysis System And its designs in LINE Corp. 2014 early

on

  • 5,386 views

LINE developer meetup in fukuoka 1 #LINE_DM

LINE developer meetup in fukuoka 1 #LINE_DM

Statistics

Views

Total Views
5,386
Views on SlideShare
3,279
Embed Views
2,107

Actions

Likes
20
Downloads
38
Comments
0

7 Embeds 2,107

http://d.hatena.ne.jp 1971
https://twitter.com 85
http://feedly.com 38
http://obtuse-angled25.rssing.com 5
http://www.feedspot.com 4
http://digg.com 3
http://newsblur.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Log Analysis System And its designs in LINE Corp. 2014 early Log Analysis System And its designs in LINE Corp. 2014 early Presentation Transcript

    • Log Analysis Systems And its designs In LINE Corp. 2014 Early 2014/02/20 (Thu) @tagomoris (TAGOMORI Satoshi) LINE Corp. LINE Developer Meetup in Fukuoka #1 14年2月20日木曜日
    • TAGOMORI Satoshi (@tagomoris) LINE Corp. Development Support Team 14年2月20日木曜日
    • 14年2月20日木曜日
    • 14年2月20日木曜日
    • Data Collecting, Aggregation, Analytics, Visualization 14年2月20日木曜日
    • See also: 「OSSで支えられるライブドアの巨大ログ集計」 (2012 Summer) http://www.slideshare.net/tagomoris/oss-nhntech 「Log analysis system with Hadoop in livedoor 2013 Winter」(2013 early) http://www.slideshare.net/tagomoris/log-analysis-with-hadoop-in-livedoor-2013 「Batch and Stream processing with SQL」 (2013 Fall) http://www.slideshare.net/tagomoris/batch-and-stream-processing-with-sql 14年2月20日木曜日
    • disclaimer: This talk is about “a” log analysis system in LINE. 14年2月20日木曜日
    • SQL好きですか? 14年2月20日木曜日
    • System Overview (2014) Archive Storage (scribed) Fluentd Cluster Web Servers STREAM Notifications (IRC) Fluentd Watchers Graph Tools Norikra webhdfs Hadoop Cluster (HDFS, MR) hive server Huahin Manager Presto Cluster 14年2月20日木曜日 BATCH SCHEDULED BATCH Shib ShibUI
    • System Overview (2014) Ruby Fluentd Cluster Web Servers STREAM Archive Storage (scribed) Notifications (IRC) Fluentd Watchers Graph Tools Norikra Java webhdfs Hadoop Cluster (HDFS, MR) hive server Huahin Manager Presto Cluster 14年2月20日木曜日 SCHEDULED BATCH NodeBATCH Perl Shib ShibUI
    • System Overview (2014) Archive Storage (scribed) Fluentd Cluster Web Servers fluentd.conf STREAM Notifications (IRC) Fluentd Watchers Graph Tools Norikra webhdfs Hadoop Cluster (HDFS, MR) SQL hive server Huahin Manager Presto Cluster 14年2月20日木曜日 BATCH SCHEDULED BATCH Shib ShibUI
    • Who uses it? Internet Messaging Service Public Web Service Game Private Web Service (for closed person-to-persons) Internal Web Service (administrator only) Data Analytics Service 14年2月20日木曜日
    • Who uses it? Internet Messaging Service Public Web Service Game Private Web Service (for closed person-to-persons) Internal Web Service (administrator only) Data Analytics Service 14年2月20日木曜日
    • Data analytics players PROGRAMMER Raw Log Formats Application Logs Data Sizes Data Semantics SERVICE DIRECTOR SALES Whatever Metrics They Want Storages Hadoop Cluster Visualization Tools ADMINISTRATOR ........ BOARD MEMBER 14年2月20日木曜日
    • Data analytics players PROGRAMMER Raw Log Formats Application Logs Data Sizes Data Semantics SERVICE DIRECTOR SALES WE NEED THE QUERY LANGUAGE Whatever Metrics They Want WHAT THEY ALL CAN RUN AND UNDERSTAND!!!!!!!!!! Storages Hadoop Cluster Visualization Tools ADMINISTRATOR ........ BOARD MEMBER 14年2月20日木曜日
    • Archive Storage (scribed) Fluentd Cluster Web Servers STREAM Notifications (IRC) Fluentd Watchers Graph Tools webhdfs Hadoop Cluster (HDFS, MR) 14年2月20日木曜日 hive server Huahin Manager BATCH SCHEDULED BATCH Shib ShibUI
    • Archive Storage (scribed) Fluentd Cluster Web Servers STREAM Notifications (IRC) Fluentd Watchers Graph Tools Norikra webhdfs Hadoop Cluster (HDFS, MR) hive server Huahin Manager Presto Cluster 14年2月20日木曜日 BATCH SCHEDULED BATCH Shib ShibUI
    • 14年2月20日木曜日
    • SQL: Hive 14年2月20日木曜日
    • SQL: Hive 14年2月20日木曜日
    • Norikra Schema-less Stream Processing with SQL 14年2月20日木曜日
    • 14年2月20日木曜日
    • Software Stack Hadoop: CDH 4.5.0 w/ JDK6 (WebHDFS, Hive, HiveServer) Presto: 0.59 w/ JDK7 Shib: v0.3.0 w/ Node.js v0.10 Fluentd: v0.10.39 w/ Ruby 2.0.0 And many plugins Norikra: v0.1.3 w/ JRuby 1.7.4 14年2月20日木曜日
    • 14年2月20日木曜日
    • Batches and Streams Hadoop is for batches High performance batch is important HDFS has good performance Stream log writing and calcurations are also VERY VERY IMPORTANT Hybrid System: Stream processing + Batch 14年2月20日木曜日
    • Collect and deliver as STREAM 14年2月20日木曜日 Calculate as BATCH
    • 1st gen: First impl. Web Servers Scribed STREAM (LIBHDFS) Hadoop Cluster CDH3b2 (Hadoop Streaming) 14年2月20日木曜日 hive server BATCH Shib Archive Storage (scribed)
    • Hadoop and Hive Filesystem (HDFS) Processing Framework (Hadoop MapReduce) Query Compiler: SQL -> MR (Hive) Thrift API Server (HiveServer) Old style Java (....) 14年2月20日木曜日
    • Shib WebUI Client for Hive Query editor/executer + result viewer HTTP JSON API Gateway for Hive query execution Node.js 14年2月20日木曜日
    • 2nd gen: +Fluentd Web Servers Archive Storage (scribed) Fluentd Cluster STREAM Cludera Hoop Hadoop Cluster CDH3u2 (Hive) 14年2月20日木曜日 hive server Huahin Manager BATCH Shib
    • Fluentd Log collector Apache-like configuration Pluggable Input/Output/Buffer on public plugin repository (rubygems.org) Ruby 1.9 or later Collect, and Store collect: fluent-agent-lite (perl) store: fluent-plugin-webhdfs 14年2月20日木曜日
    • Collect and deliver as STREAM Calculate as BATCH Monitor as STREAM 14年2月20日木曜日
    • 3rd gen: +Monitoring Web Servers Archive Storage (scribed) Fluentd Cluster STREAM Fluentd Watchers webhdfs Hadoop Cluster CDH3u5 (Hive) 14年2月20日木曜日 Notifications (IRC) hive server Huahin Manager BATCH Graph Tools SCHEDULED BATCH Shib ShibUI
    • Fluentd plugins Monitoring in real-time message num/size counting min, max, average and percentiles Visualization and Notification Graph tools (GrowthForecast / Focuslight) IRC (or Mail, HipChat, ...) 14年2月20日木曜日
    • 4th gen: +HA (hadoop) Web Servers Archive Storage (scribed) Fluentd Cluster STREAM Fluentd Watchers webhdfs Hadoop Cluster CDH4 (HDFS, YARN) 14年2月20日木曜日 Notifications (IRC) hive server Huahin Manager BATCH Graph Tools SCHEDULED BATCH Shib ShibUI
    • Calculate as STREAM on demand Collect and deliver as STREAM Calculate as BATCH Monitor as STREAM 14年2月20日木曜日
    • 5th gen: +Norikra Archive Storage (scribed) Fluentd Cluster Web Servers STREAM Notifications (IRC) Fluentd Watchers Graph Tools Norikra webhdfs Hadoop Cluster (HDFS, MR) 14年2月20日木曜日 hive server Huahin Manager BATCH SCHEDULED BATCH Shib ShibUI
    • Norikra SQL Query for Streams Add/Remove on demand (without restarts) ... and many features HTTP JSON API JRuby on JVM with Esper 14年2月20日木曜日
    • Norikra Queries: (1) {“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Fukuoka”} SELECT name, age FROM events WHERE current=”Fukuoka” {“name”:”tagomoris”,”age”:34} 14年2月20日木曜日
    • Norikra Queries: (2) {“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Fukuoka”} SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) WHERE current=”Fukuoka” GROUP BY age every 5 mins {”age”:34,”cnt”:3}, {“age”:33,”cnt”:1}, ... 14年2月20日木曜日
    • Calculate as STREAM on demand Collect and deliver as STREAM Calculate as BATCH Monitor as STREAM 14年2月20日木曜日 Calculate as BATCH immediately on demand
    • 5th gen: +Presto Archive Storage (scribed) Fluentd Cluster Web Servers STREAM Notifications (IRC) Fluentd Watchers Graph Tools Norikra webhdfs Hadoop Cluster (HDFS, MR) hive server Huahin Manager Presto Cluster 14年2月20日木曜日 BATCH SCHEDULED BATCH Shib ShibUI
    • Presto Open sourced by Facebook at 2013/11/07 MPP Engine: Massive Parallel Processing Engine like Google BigQuery(Dremel), Cloudera Impala short latency queries (It’s not main usage of Hive) SQL HTTP JSON API Java 7 ! 14年2月20日木曜日
    • Shib v0.3.0: presto support HiveServer User (browser) THRIFT HiveServer2 Shib Analysis Batches HTTP JSON API THRIFT HTTP JSON API Presto Service Admin Tools 14年2月20日木曜日
    • Non-monolithic architecture Many subsystems for many purposes Add/Update/Replace per subsystems High interoperability by RPC-based connections Gateway can hide backend implementations 14年2月20日木曜日
    • WHAT TO DO IS NOT WHAT WE WANT TO BUT WHAT WE ARE WANTED TO. 14年2月20日木曜日
    • THERE ARE MANY OF WHAT TO DO! THANKS! 14年2月20日木曜日
    • Software list: http://fluentd.org/ http://prestodb.io/ http://norikra.github.io/ https://github.com/tagomoris/shib 14年2月20日木曜日