Prdc2012

Method for Monitoring and
Proﬁling of Hadoop using AspectJ

Yusuke Shimizu, Kouhei Sakurai, Satoshi Yamane
Graduate School of Natural Science & Technology,
Kanazawa University

PRDC2012@TOKIMESSE

Introduction
The use scene of Large-scale Distributed Systems is increasing

Large-scale Distributed System is ...

“Flexible and available architecture for large
scale computation and data processing on a
network of commodity hardware”
[-- P. Julio, 2009]

- e.g. Apache Hadoop

For Dependable Distributed System ..
We have to consider
about and deal with ...
Only using advance
- Non-deterministic and static analysis
network or veriﬁcation
- Fault tolerance is difﬁcult
- Incomprehensible users

We also need runtime monitoring and analysis

How to monitor and debug

General method of debugging or monitoring the
Hadoop is ...
• logging text messages

• checking metrics via Web Interfaces, Ganglia, etc..

There are difﬁculties and requirements

General method of debugging or monitoring the
Hadoop is ...
• logging text message
→ Difﬁculties by a huge number of nodes
• checking metrics via Web Interfaces, Ganglia, etc..
→ For operators, not enough to developers

Introduction

Proposal

1. The Method Level Monitor

2. The Adaptive Proﬁling

- Provide effective information for development
- Help developers to understand system behaviors
and speciﬁcations

Outline of Talk

Introduction
- Distributed system’s difﬁculty
Proposal
- Monitor
- Proﬁle Method
Experimental Results & Conclusion

2. PROPOSALS

The Runtime Monitor

&

The Adaptive Proﬁling Method

Outline of Proposed System
Hadoop Monitor Profile

•MapReduce Record Trace Count up
using AspectJ frequency
•HDFS
of
•RPC instruction

Monitor

• observe the system behavior at runtime
• logging executed instructions passively = make “Trace”
‣ using AspectJ
- “AspectJ is implementation of “Aspect
Oriented Programming” using Java “
‣ no modiﬁcation is needed to applications

Architecture of Hadoop & Monitor
Master

Name Job Slaves
Node Tracker

Map Map

Data Reduce Data Reduce
Blocks Blocks
Monitor

Data Task Data Task
Node Tracker Node Tracker
RPC

RPC
Monitor Monitor

Architecture of Hadoop & Monitor
Master

Name Job Slaves
Node Tracker

Map Map

Data Reduce Data Reduce
Blocks Blocks
Monitor

Data Task Data Task
Node Tracker Node Tracker
RPC

RPC
Monitor Monitor

Master’s Trace Slaves’ Trace
‣NameNode Trace ‣DataNode Trace
‣JobTracker Trace ‣TaskTracker Trace
‣RPC Trace ‣RPC Trace

Method of Proﬁling

• based on frequency of instructions
• count up instructions involved in “Trace”
• count up on each grain
➡ each node
➡ each process
➡ each method

Outline of Talk
Introduction
- Distributed system’s difﬁculty
Proposal
- Monitor
- Proﬁle Method
Experimental Results & Conclusion

3. EXPERIMENT

Benchmark on the impact of the Monitor
&
do Proﬁling
&
Visualize the proﬁling results

Benchmark - the impact of Monitor

Throughput [MB/sec] = Data size / Elapsed time
Data size Elapsed time Throughput Trace size
Monitor
[GB] [sec] [MB/sec] [MB]

1 ⃝ 2m 25s (145sec) 6.9 2.4
84.1%
1 × 2m 2s (122s) 8.2 0

10 ⃝ 8m 45s (525sec) 19.0 3.6
88.3%
10 × 7m 45s (465sec) 21.5 0

1h 21m 54s
100 ⃝ 20.4 31.6
96.2%
(4,914sec)
1h 18m 37s
100 × 21.2 0
(4,717sec)

use “terasort” - a sample sorting program using MapReduce
Trace size increase by 6.43 KB/sec

A Part of Proﬁling
the statistics of the last 10 seconds, about master
Tue Nov 13 12:30:08 JST 2012
from 1352777408766 until 10000 after
HOSTNAME ::> DAEMON & PROCESS = { METHODS }
--------------------------
sirius:177 ::>>
[namenodetrace : 23, jobtrackertrace : 41, datanodetrace : 0,
tasktrackertrace : 0, rpctrace : 113]
={
! hdfs.server.namenode.CorruptReplicasMap.numCorruptReplicas=5
! hdfs.server.namenode.FSNamesystem.getBlockLocations=3
! hdfs.server.namenode.FSNamesystem.getDatanode=1
! hdfs.server.namenode.NameNode.getBlockLocations=4
! hdfs.server.namenode.NameNode.getFileInfo=2
! hdfs.server.namenode.NameNode.sendHeartbeat=2
! hdfs.server.namenode.NameNode.verifyVersion=3
! hdfs.server.namenode.UnderReplicatedBlocks.BlockIterator.hasNext=2
! hdfs.server.namenode.UnderReplicatedBlocks.BlockIterator.next=1
! ipc.Client.Connection.PingInputStream.read=4
! ipc.Client.Connection.sendParam=2
! ipc.Client.call=1
! ipc.ConnectionHeader.readFields=4

Node Level Profiling
Node Level Profiling is
-- profiling by aggregating frequencies of
instruction within each node for per unit
time.
800
192.168.1.10 192.168.1.11
number
of
occurrences

640
192.168.1.12 192.168.1.13
192.168.1.14 192.168.1.15
480

320

160

0
time(s) 6420

Process Level Profiling about MASTER

Process Level Profiling is
-- profiling by aggregating frequencies of instruction of each process
within each node for per unit time.
Master
400
rpc
number
of
occurrences

300 jobtracker
namenode
200

100

0
6420
time(s)

Process Level Proﬁling about Slaves
192.168.1.11
200
number
of
occurrences

rpctrace
150 tasktrackertrace
datanodetrace
100

50

0
6420 time(s)
Map phase Reduce phase
192.168.1.12 192.168.1.13
There are free resouces.
200

150
150

113
should do
100 75 speculative executions.
50 38

192.168.1.14 192.168.1.15
200 200
150

100
150

100
Imbalance of RPC
50 50

Conclusion
summary
• Proposal
- the lightweight method-level monitor using AspectJ
- the profiling method based on frequency of instruction
• Provide effective information for development
• Help developers to understand system behaviors and
specifications
future work
• Create an algorithm for determining the degree of deviation
using a profiling results indicate the possibility of failure.

Thank you for your kind attention

Prdc2012

Recommended

Recommended

More Related Content

What's hot

What's hot (14)

Viewers also liked

Viewers also liked (18)

Similar to Prdc2012

Similar to Prdc2012 (20)

Prdc2012