NameNode and DataNode Couplingfor a Power-proportional Hadoop Distributed File System

NAMENODE AND DATANODE COUPLING
FOR A POWER-PROPORTIONAL
HADOOP DISTRIBUTED FILE SYSTEM
Hieu Hanh Le, Satoshi Hikida and Haruo Yokota
Tokyo Institute of Technology
Appeared in DASFAA 2013
The 18th International Conference on Database
Systems for Advanced Applications (Wuhan, China)
1

 Background
 Research Motivation
 Goal and Approach
 Proposals
 Experimental Evaluation
 Conclusion
Agenda
2

Background
 Hadoop Distributed File System (HDFS) is widely
used as data storage for applications in the Cloud
 Commercial Off-the-self-based system
 Support MapReduce framework
 Good scalability
 Utilize a huge number of DataNodes to store huge amount
of data requested by data-intensive applications
 Expand the power consumption of storage system
 Power-aware file systems are moving towards
power-proportional design
3

[Background]
Power-proportional Storage System
 System should consume energy in proportion to
amount of work performed [Barroso and Holzle, 2007]
 Set system’s operation to multiple gears containing
different number of data nodes
 Made possible by data placement methods
4
High Gear
Node
1
Node
2
D2
Node
3
D3D1
Node
4
D4
Low Gear
Node
1
Node
4
Node
3
Node
2
D2 D3D1 D4
D1 D4
migration

Research Motivation
5
 Gear-shifting is vital in power-proportional system
 The system needs to reflect updated data that was
modified in a lower gear to guarantee the higher
performance
 Re-transfer the updated data according to the data
placement
 The inefficient gear-shifting process in current methods
for the HDFS [Rabbit, Sierra]
 Bottleneck in metadata access
 High communication cost among nodes
Rabbit: Robust and Flexible Power-proportional Storage, ACM SOCC 2010
Sierra: Practical Power-proportionality for Data Center Storage, ACM EuroSys 2011

Gear-shifting in current HDFS-based methods [1/10]
6
Data
Node1
Data
Node4
Data
Node2
Data
Node3
Name
Node
Write Dataset
D = {D1, D2, D3, D4}
Data
Node1
Data
Node4
Data
Node2
Data
Node3
Name
Node
Gear Up
Eg: Rabbit, Sierra
D1
D2 D3
D4
D2 D3
D1 D4
Low Gear High Gear

7
Data
Node1
Data
Node4
Data
Node2
Data
Node3
Name
Node
Write Dataset
D = {D1, D2, D3, D4}
Data
Node1
Data
Node4
Data
Node2
Data
Node3
Name
Node
Gear Up
Eg: Rabbit, Sierra
1. Access metadata to
identify updated blocks
Congestion
D1
D2 D3
D4
D2 D3
D1 D4
Low Gear High Gear

8
Data
Node1
Data
Node4
Data
Node2
Data
Node3
Name
Node
Write Dataset
D = {D1, D2, D3, D4}
Data
Node1
Data
Node4
Data
Node2
Data
Node3
Name
Node
Gear Up
2. Transfer updated
blocks
Eg: Rabbit, Sierra
Congestion
D1
D2 D3
D4
D2 D3
D1 D4
2.1 Command
issuance
2.2 Transfer
block
Low Gear High Gear

9
Data
Node1
Data
Node4
Data
Node2
Data
Node3
Name
Node
Write Dataset
D = {D1, D2, D3, D4}
Data
Node1
Data
Node4
Data
Node2
Data
Node3
Name
Node
Gear Up
2. Transfer updated
blocks
Eg: Rabbit, Sierra
Congestion
D1
D2 D3
D4
D2 D3
D1 D4
2.1 Command
issuance
2.2 Transfer
block
Low Gear High Gear

10
Data
Node1
Data
Node4
Data
Node2
Data
Node3
Name
Node
Write Dataset
D = {D1, D2, D3, D4}
Data
Node1
Data
Node4
Data
Node2
Data
Node3
Name
Node
Gear Up
2. Transfer updated
blocks
Eg: Rabbit, Sierra
Congestion
D1
D2 D3
D4
D2 D3
D1 D4
2.1 Command
issuance
2.2 Transfer
block
Low Gear High Gear

11
Data
Node1
Data
Node4
Data
Node2
Data
Node3
Name
Node
Write Dataset
D = {D1, D2, D3, D4}
Data
Node1
Data
Node4
Data
Node2
Data
Node3
Name
Node
Gear Up
2. Transfer updated
blocks
Eg: Rabbit, Sierra
Congestion
D1
D2 D3
D4
D2 D3
D1 D4
2.1 Command
issuance
2.2 Transfer
block
Low Gear High Gear

12
Data
Node1
Data
Node4
Data
Node2
Data
Node3
Name
Node
Write Dataset
D = {D1, D2, D3, D4}
Data
Node1
Data
Node4
Data
Node2
Data
Node3
Name
Node
Gear Up
2. Transfer updated
blocks
Eg: Rabbit, Sierra
Congestion
D1
D2 D3
D4
D2 D3
D1 D4
2.1 Command
issuance
2.2 Transfer
block
Low Gear High Gear

13
Data
Node1
Data
Node4
Data
Node2
Data
Node3
Name
Node
Write Dataset
D = {D1, D2, D3, D4}
Data
Node1
Data
Node4
Data
Node2
Data
Node3
Name
Node
Gear Up
2. Transfer updated
blocks
Eg: Rabbit, Sierra
Congestion
D1
D2 D3
D4
D2 D3
D1 D4
2.1 Command
issuance
2.2 Transfer
block
Low Gear High Gear

14
Data
Node1
Data
Node4
Data
Node2
Data
Node3
Name
Node
Write Dataset
D = {D1, D2, D3, D4}
Data
Node1
Data
Node4
Data
Node2
Data
Node3
Name
Node
Gear Up
2. Transfer updated
blocks
Eg: Rabbit, Sierra
Congestion
D1
D2 D3
D4
D2 D3
D1 D4D1
2.1 Command
issuance
2.2 Transfer
block
Low Gear High Gear

15
Data
Node1
Data
Node4
Data
Node2
Data
Node3
Name
Node
Write Dataset
D = {D1, D2, D3, D4}
Data
Node1
Data
Node4
Data
Node2
Data
Node3
Name
Node
Gear Up
2. Transfer updated
blocks
Eg: Rabbit, Sierra
Sequentially
(1 block/connection)
Congestion
InefficiencyD1
D2 D3
D4
D2 D3
D1 D4D1 D4
2.1 Command
issuance
2.2 Transfer
block
Low Gear High Gear

Goal and Approach
 Goal
 Propose a novel architecture for efficient gear-shifting for
power-proportional HDFS
 Approach
 Utilize distributed metadata management (MDM)
 Eliminate the bottleneck of the centralized MDM
 Coupling NameNode and DataNode (NDCouplingHDFS)
 Localize the range of updated blocks maintained by metadata
management
 Reduce the communication cost among nodes
 Enable multiple blocks transfer to improve the efficiency in
HDFS
16

[Proposals]
Distributed MDM
 Distribute MDM to multiple nodes to decentralize the load during
gear-shiftings
 Require a distributed MDM that is update conscious
 The MDM is transferred when the system shifts gears
 Low cost of search/insert/delete operations
 Inefficient distributed hash table based method
 For each transferred file, the hash function is needed to be applied
 Efficient range based method
 For a range of files, all the metadata can be transferred within a limited
structure transverses
 Apply two range-based methods
 Each node statically maintains a separate subnamespace
(Static Directory Partition-SDP)
 Parallel index technique with well concurrency control (Fat-Btree) [*]
17
[*] A Concurrency Control Protocol for Parallel B-tree structure without
latch-coupling for explosively growing digital content, EDBT 2008

[Proposals]
NDCouplingHDFS with Distributed MDM
 Each node maintains a subnamespace of the whole
namspace of the system
 The mapping information [Node, Range] is managed by
Distributed MDM
18
Data
Management
Distributed
MDM
ND1
Distributed
MDM
Data
Management
ND2
Distributed
MDM
Data
Management
ND3
Distributed
MDM
Data
Management
ND4
2. Forward request to
responsible nodes
3. Serve the request
and return the results
1. Send
request of 25
4. Return results
A NDCoulingHDFS
node
ND1: [1, 10]
ND2: [11,20]
ND3: [21, 30]
ND4: [31,~]

[Proposals]
Efficient Gear-shifting [1/6]19
Data
Management
Distributed
MDM
Distributed
MDM
Data
Management
Distributed
MDM
Data
Management
Distributed
MDM
Data
Management
A DB C
A1 B1 C1 D1
WOL
Log
WOL
Log
A
A1
D
D1
<File, Temp Node, Intended Node>
Reactivated Reactivated
A1
B1 C1
D1A1 D1
 The process is
distributed to
multiple nodes
 The command
issuance from
Disitributed MDM
and Data
Management is
locally performed
 Updated blocks
are transferred in
batch way
(multiple blocks
per connection)

[Proposals]
Data
Management
Distributed
MDM
Distributed
MDM
Data
Management
Distributed
MDM
Data
Management
Distributed
MDM
Data
Management
A DB C
A1 B1 C1 D1
WOL
Log
WOL
Log
A
A1
D
D1
A1
B1 C1
D1A1 D1
1. Transfer updated
metadata
1. Transfer updated
metadata
 The process is
distributed to
multiple nodes
 The command
issuance from
Disitributed MDM
and Data
Management is
locally performed
 Updated blocks
are transferred in
batch way
(multiple blocks
per connection)

[Proposals]
Data
Management
Distributed
MDM
Distributed
MDM
Data
Management
Distributed
MDM
Data
Management
Distributed
MDM
Data
Management
A DB C
A1 B1 C1 D1
WOL
Log
WOL
Log
A
A1
D
D1
A1
B1 C1
D1A1 D1
1. Transfer updated
metadata
1. Transfer updated
metadata
 The process is
distributed to
multiple nodes
 The command
issuance from
Disitributed MDM
and Data
Management is
locally performed
 Updated blocks
are transferred in
batch way
(multiple blocks
per connection)
2. Command issuance 2. Command issuance

[Proposals]
Data
Management
Distributed
MDM
Distributed
MDM
Data
Management
Distributed
MDM
Data
Management
Distributed
MDM
Data
Management
A DB C
A1 B1 C1 D1
WOL
Log
WOL
Log
A
A1
D
D1
A1
B1 C1
D1A1 D1
1. Transfer updated
metadata
1. Transfer updated
metadata
 The process is
distributed to
multiple nodes
 The command
issuance from
Disitributed MDM
and Data
Management is
locally performed
 Updated blocks
are transferred in
batch way
(multiple blocks
per connection)
3. Transfer blocks3. Transfer blocks

[Proposals]
Data
Management
Distributed
MDM
Distributed
MDM
Data
Management
Distributed
MDM
Data
Management
Distributed
MDM
Data
Management
A DB C
A1 B1 C1 D1
WOL
Log
WOL
Log
A
A1
D
D1
A1
B1 C1
D1A1 D1
1. Transfer updated
metadata
1. Transfer updated
metadata
 The process is
distributed to
multiple nodes
 The command
issuance from
Disitributed MDM
and Data
Management is
locally performed
 Updated blocks
are transferred in
batch way
(multiple blocks
per connection)
4. Updated metadata4. Updated metadata

[Proposals]
Data
Management
Distributed
MDM
Distributed
MDM
Data
Management
Distributed
MDM
Data
Management
Distributed
MDM
Data
Management
A DB C
A1 B1 C1 D1
WOL
Log
WOL
Log
A
A1
D
D1
A1
B1 C1
D1A1 D1
1. Transfer updated
metadata
1. Transfer updated
metadata
 The process is
distributed to
multiple nodes
 The command
issuance from
Disitributed MDM
and Data
Management is
locally performed
 Updated blocks
are transferred in
batch way
(multiple blocks
per connection)
4. Updated metadata4. Updated metadata
Parallelism
Reduce
network cost
Efficient block
transfer

Experiment Evaluation
 Experiment 1
 Verify the effectiveness of proposals in gear-shifting
process by comparing with the normal HDFS
 Updated block reflection is the major cost
 Coupling architecture, batch block transferring
 Experiment 2
 Evaluate the effectiveness of distributed index
technique to NDCouplingHDFS
 SDP and Fat-Btree through changing the number of nodes
25

[Experiment 1]
Validity of NDCouplingHDFS in Gear-shifting
26
Updated Data Reflection
# Gears 2
# Active nodes at Low Gear 8
# Active nodes at High
Gear
16
# files 16000
File size 1MB
HDFS
Version 0.20.2
Maximum number of
transferred blocks
100
Heartbeat interval 1s
 Compare the execution time of updated data
reflection the NDCouplingHDFS with the normal
HDFS based on five configurations
 Combinations of architecture, distributed MDM (SDP,
Fat-Btree), command issuance, block transfer
 Environment

0
5
10
15
20
25
30
35
40
45
0
10
20
30
40
50
60
70
NormalHDFS SSS SBS SBB FBB
Execution time
Number of communication connections
[commnand issuance]
[Experiment 1]
Experimental Results27
46%41%
Configuration Normal
HDFS
SSS SBS SBB FBB
Architecture HDFS Coupling Coupling Coupling Coupling
MDM Central SDP SDP SDP Fat-Btree
Command
issuance
Sequential Sequential Batch Batch Batch
Block
transference
Sequential Sequential Sequential Batch Batch
Coupling architecture and
Batch block transferring highly
effected the performance
[s]

[Experiment 2]
Scalability of metadata operations
 Evaluate SDP vs. Fat-Btree
 Change the number of files and number of nodes
28
Machine
# 1, 2, 4, 8
CPU TM8600 1.0GHz
Memory DRAM 4GB
NIC 1000Mb/s
OS Linux 3.0 64bit
Java JDK-1.7.0
Fat-Btree
Fanout 16
Control
Concurrency
LCFB [Yoshihara, 2007]
Workload
#files 3000
File size 1MB

 Fat-Btree gained better scalability when the number of
nodes increases
 The read throughput scaled well due to better search cost and
concurrency control
 The efficiency in write throughput is limited due to the
synchronization cost in updating tree structure
[Experiment 2]
Experimental Results29
0
50
100
150
200
250
300
350
1 2 4 8
SDP
Fat-Btree
0
5
10
15
20
25
30
1 2 4 8
SDP
FBT
ReadThroughput[operation/s]
WriteThroughput[operations/s]
A transaction: open/create metadata
and read/write files

Conclusion
 Proposed NDCouplingHDFS for efficient gear-shifting in
power-proportional HDFS
 Significantly reduced at most 46% the execution time of
reflecting updated data compare with the normal HDFS
 Coupling architecture and batch block transferring
 Improved the IO performance by applying distributed
index technique to NDCouplingHDFS
 NDCouplingHDFS
 Maintains supporting MapReduce
 Exptected to achieve real power-proportionality including
power consumption of metadata management
30

NameNode and DataNode Coupling for a
Power-proportional Hadoop Distributed File System
Thank you for your attention!31

NameNode and DataNode Couplingfor a Power-proportional Hadoop Distributed File System

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to NameNode and DataNode Couplingfor a Power-proportional Hadoop Distributed File System

Similar to NameNode and DataNode Couplingfor a Power-proportional Hadoop Distributed File System (20)

Recently uploaded

Recently uploaded (20)