Log analysis with Hadoop in livedoor 2013

Log analysis system
with Hadoop
in livedoor 2013 Winter
2013/01/20
Hadoop Conference Japan 2013 Winter

TAGOMORI Satoshi (@tagomoris)
NHN Japan Corp.
13年1月21日月曜日

TAGOMORI SATOSHI (@TAGOMORIS)
NHN JAPAN CORP.
WEB SERVICE BUSINESS DIVISION DEVELOPMENT DEPARTMENT 2
(IN JAN 2012, LIVEDOOR -> NHN JAPAN)


livedoor in NHN Japan


large scale web services
400+ Web Servers

5Gbps @ Aug 2009
15Gbps @ Aug 2011
20+Gbps @ Jan 2013
(direct outbound + CDN)


giant access log trafﬁc

At Aug 2011 (HCJ2011)
From 96 servers
580GB/day


giant access log trafﬁc
NOW (At Jan 2013 HCJ2013W)
From 320+ servers
1.5+ TB/day (raw)
5,300,000,000+ lines/day
120,000+ lines/sec (peak time)
400Mbps log trafﬁc

What we want to do
COUNT PV,UU and others (daily)
COUNT Service metrics (daily/hourly)
FIND Surprised Errors [4xx,5xx] (immediately)
CHECK Response Times (immediately)
SERCH Logs in troubles (hourly/immediately)


Batches and Streams
Hadoop is for batches
High performance batch is important
HDFS has good performance
Stream log writing and calcurations
are also VERY VERY IMPORTANT
Hybrid System:
Stream processing + Batch

System Overview
Archive
Storage
Web
Servers Fluentd (scribed)
Cluster
Notiﬁcations
STREAM (IRC)
Fluentd
Watchers
Graph
Tools

webhdfs SCHEDULED
BATCH BATCH
hive
Hadoop Cluster server
Shib ShibUI
(HDFS, YARN) Huahin
Manager


Hadoop in livedoor 2013
18 nodes (Master 3 + Slave 15)
120core, 180GB RAM, 100TB HDFS
CDH4.1.2
NameNode HA(QJM), WebHDFS
YARN, Hive + HiverServer1


Fluentd in livedoor 2013
16 nodes (Deliver 4 + Worker 10 + Watcher 2)
Fluentd (latest release / trunk)
Ruby based message transfer
daemon
Many plugins from rubygems.org


Hadoop/Fluentd engineer
in livedoor 2013

1 person.


Processes Overview
Log collection / Archiving
Parse / Transform / Add ﬂags
Load into Hive tables
On-demand queries
Scheduled queries
Stream aggregations + Notiﬁcations

Past and present
1st gen: Fully batch (late 2011)

Scribed + Hadoop

2nd gen: Partially stream processing (earlier 2012)

Fluentd + Hadoop

3rd gen: Fully stream processing (late 2012)

Fluentd + Hadoop + Graph Tools

4th gen: New Cluster with CDH4 (earlier 2013)


BREAK.


1st gen: First impl. Archive
Storage
Web
Servers (scribed)
Scribed

STREAM

(LIBHDFS)

BATCH
Hadoop Cluster hive
server
CDH3b2 Shib
(Hadoop Streaming)


Shib: Hive Web Client

https://github.com/tagomoris/shib

1st gen: Fully batch
Log collection / Archiving Scribed(libhdfs)

Parse / Transform / Add ﬂags Hadoop
Streaming

HiveServer
On-demand queries + Shib

Scheduled queries

1st gen: Fully batch
Simplicity: easy to implement
Shib: easy to run on-demand query
Latency: hourly rotation + import batch
Performance: import batch needs CPU
Scribed: libhdfs dependency problem


2nd gen: +Fluentd
Archive
Storage
Web
Cluster

STREAM

Cludera Hoop
BATCH
Hadoop Cluster hive
server
CDH3u2 Shib
Huahin
(Hive) Manager


Fluentd stream processing
out_exec_ﬁlter
any ﬁlter programs with STDIN/
STDOUT
compatible with Hadoop Streaming!
out_hoop
output plugin to write HDFS over Hoop
Hoop: a.k.a. HttpFs in Hadoop 2.0.x

Fluentd stream processing
Web Servers

Fluentd worker
Fluentd deliver
Fluentd worker
Fluentd deliver
Fluentd worker
Fluentd deliver
Fluentd worker
Hoop Server
Fluentd worker
HDFS
Fluentd worker

Huahin Manager
REST API for:
JobTracker (MRv1)
ResourceManager (YARN)
HiveServer

http://huahinframework.org/huahin-manager/


2nd gen: +Fluentd
Log collection / Archiving Fluentd

Parse / Transform / Add ﬂags Fluentd

HiveServer

Scheduled queries

2nd gen: +Fluentd
Compatibility:
RPC based HDFS/JobTracker Access
Performance: import needs no CPU
(Load Only)
Latency: hourly rotation only
Latency: hourly rotation for any queries
Hoop Server: SPOF / trafﬁc bottleneck

3rd gen: ++++++
Archive
Storage
Web
Cluster
Notiﬁcations
STREAM (IRC)
Fluentd
Watchers
Graph
Tools

webhdfs SCHEDULED
BATCH BATCH
Hadoop Cluster hive
server
CDH3u5 Shib ShibUI
Huahin
(Hive) Manager


WebHDFS (CDH3u5 or CDH4)
HttpFs (Hoop) NameNode

DataNode
httpfs
Client
server DataNode

HTTP Java Native DataNode

WebHDFS NameNode

DataNode
Client
DataNode

DataNode
HTTP

Fluentd online aggregation

Semi-realtime aggregation to:
counts errors of HTTP response
calculate avg/%tiles of response time
draw graphs immediately
Many plugins for real time aggregation


Graph Tools:
GrowthForecast / HRForecast

Graph drawing tools to update values
over very simple HTTP request
GrowthForecast: Real-time values
HRForecast: Summarized (past) values


HTTP Status/Response Time
on GrowthForecast
HTTP STATUS: 2XX(BLUE),3XX(GREEN),4XX(ORANGE), 5XX(RED)

HTTP RESPONSE TIMES: AVG, [90, 95, 98, 99]PERCENTILE

http://kazeburo.github.com/GrowthForecast/

ShibUI


ShibUI

https://github.com/kazeburo/hrforecast


3rd gen: +++++++
Log collection / Archiving Fluentd

Parse / Transform / Add ﬂags Fluentd

HiveServer

Scheduled queries ShibUI
Fluentd

3rd gen: +++++++
NO SPOF: for data stream
Real time monitoring
Queries for services:
Scheduled queries, Visualization
Latency: hourly rotation for any queries
SPOF: NameNode (VIP & DRBD is xxxx...)

4th gen: NOW
Archive
Storage
Web
Cluster
Notiﬁcations
STREAM (IRC)
Fluentd
Watchers
Graph
Tools

webhdfs SCHEDULED
BATCH BATCH
Hadoop Cluster hive
server
CDH4 Shib ShibUI
Huahin
(HDFS, YARN) Manager


4th gen: CDH4.1.2
NO SPOF: QJM based NameNode HA
Performance: YARN (?)
Latency: multiple rotation in an hour
with hive table schema change
NONE should be improved!


Good parts for solo engineer:

RPC: Loosely-coupled architecture
High compatibility / Low maintenance cost

Open Source
All components are OSS

Open knowledge
Well blogged / presentationed


OUR DRIVER IS
"OPENNESS"

thanks to crouton & @kbysmnr !

Software list:

https://ccp.cloudera.com/display/SUPPORT/Downloads
http://fluentd.org/
http://fluentd.org/plugin/
https://github.com/tagomoris/fluent-agent-lite
https://github.com/tagomoris/shib
https://github.com/tagomoris/shibui
http://huahinframework.org/huahin-manager/
http://kazeburo.github.com/GrowthForecast/
http://github.com/kazeburo/hrforecast


See also:
Hadoop and Subsystem in livedoor (2011)
http://www.slideshare.net/tagomoris/hadoop-and-subsystems-in-livedoor-hcj11f

Distributed message stream processing on Fluentd
http://www.slideshare.net/tagomoris/distributed-stream-processing-on-fluentd-fluentd

Hive Tools in NHN Japan
http://www.slideshare.net/tagomoris/hive-tools-in-nhn-japan-hadoopreading

OSS based large scale log aggregation in livedoor
http://www.slideshare.net/tagomoris/oss-nhntech

Fluentd and WebHDFS
http://www.slideshare.net/tagomoris/fluentd-and-webhdfs


Log analysis with Hadoop in livedoor 2013

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Log analysis with Hadoop in livedoor 2013

Similar to Log analysis with Hadoop in livedoor 2013 (20)

More from SATOSHI TAGOMORI

More from SATOSHI TAGOMORI (20)

Recently uploaded

Recently uploaded (12)

Log analysis with Hadoop in livedoor 2013