NameNode and DataNode Couplingfor a Power-proportional Hadoop Distributed File System
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

NameNode and DataNode Coupling for a Power-proportional Hadoop Distributed File System

  • 886 views
Uploaded on

An architecture for efficient gear-shifting for power-proportional Hadoop Distributed File System

An architecture for efficient gear-shifting for power-proportional Hadoop Distributed File System

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Command issuance job means the NameNode, through heartbeats, sends a command which includes the information of a pair (Block b, DataNode N) to the DataNode M. After receiving the command, DataNode M will look for block b and send it to DataNode N.
    Are you sure you want to
    Your message goes here
  • hi Le-san, what is the command issuance's job?
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
886
On Slideshare
886
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
8
Comments
2
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. NAMENODE AND DATANODE COUPLINGFOR A POWER-PROPORTIONALHADOOP DISTRIBUTED FILE SYSTEMHieu Hanh Le, Satoshi Hikida and Haruo YokotaTokyo Institute of TechnologyAppeared in DASFAA 2013The 18th International Conference on DatabaseSystems for Advanced Applications (Wuhan, China)1
  • 2.  Background Research Motivation Goal and Approach Proposals Experimental Evaluation ConclusionAgenda2
  • 3. Background Hadoop Distributed File System (HDFS) is widelyused as data storage for applications in the Cloud Commercial Off-the-self-based system Support MapReduce framework Good scalability Utilize a huge number of DataNodes to store huge amountof data requested by data-intensive applications Expand the power consumption of storage system Power-aware file systems are moving towardspower-proportional design3
  • 4. [Background]Power-proportional Storage System System should consume energy in proportion toamount of work performed [Barroso and Holzle, 2007] Set system’s operation to multiple gears containingdifferent number of data nodes Made possible by data placement methods4High GearNode1Node2D2Node3D3D1Node4D4Low GearNode1Node4Node3Node2D2 D3D1 D4D1 D4migration
  • 5. Research Motivation5 Gear-shifting is vital in power-proportional system The system needs to reflect updated data that wasmodified in a lower gear to guarantee the higherperformance Re-transfer the updated data according to the dataplacement The inefficient gear-shifting process in current methodsfor the HDFS [Rabbit, Sierra] Bottleneck in metadata access High communication cost among nodesRabbit: Robust and Flexible Power-proportional Storage, ACM SOCC 2010Sierra: Practical Power-proportionality for Data Center Storage, ACM EuroSys 2011
  • 6. Gear-shifting in current HDFS-based methods [1/10]6DataNode1DataNode4DataNode2DataNode3NameNodeWrite DatasetD = {D1, D2, D3, D4}DataNode1DataNode4DataNode2DataNode3NameNodeGear UpEg: Rabbit, SierraD1D2 D3D4D2 D3D1 D4Low Gear High Gear
  • 7. Gear-shifting in current HDFS-based methods [2/10]7DataNode1DataNode4DataNode2DataNode3NameNodeWrite DatasetD = {D1, D2, D3, D4}DataNode1DataNode4DataNode2DataNode3NameNodeGear UpEg: Rabbit, Sierra1. Access metadata toidentify updated blocksCongestionD1D2 D3D4D2 D3D1 D4Low Gear High Gear
  • 8. Gear-shifting in current HDFS-based methods [3/10]8DataNode1DataNode4DataNode2DataNode3NameNodeWrite DatasetD = {D1, D2, D3, D4}DataNode1DataNode4DataNode2DataNode3NameNodeGear Up2. Transfer updatedblocksEg: Rabbit, SierraCongestionD1D2 D3D4D2 D3D1 D42.1 Commandissuance2.2 TransferblockLow Gear High Gear1. Access metadata toidentify updated blocks
  • 9. Gear-shifting in current HDFS-based methods [4/10]9DataNode1DataNode4DataNode2DataNode3NameNodeWrite DatasetD = {D1, D2, D3, D4}DataNode1DataNode4DataNode2DataNode3NameNodeGear Up2. Transfer updatedblocksEg: Rabbit, SierraCongestionD1D2 D3D4D2 D3D1 D42.1 Commandissuance2.2 TransferblockLow Gear High Gear1. Access metadata toidentify updated blocks
  • 10. Gear-shifting in current HDFS-based methods [5/10]10DataNode1DataNode4DataNode2DataNode3NameNodeWrite DatasetD = {D1, D2, D3, D4}DataNode1DataNode4DataNode2DataNode3NameNodeGear Up2. Transfer updatedblocksEg: Rabbit, SierraCongestionD1D2 D3D4D2 D3D1 D42.1 Commandissuance2.2 TransferblockLow Gear High Gear1. Access metadata toidentify updated blocks
  • 11. Gear-shifting in current HDFS-based methods [6/10]11DataNode1DataNode4DataNode2DataNode3NameNodeWrite DatasetD = {D1, D2, D3, D4}DataNode1DataNode4DataNode2DataNode3NameNodeGear Up2. Transfer updatedblocksEg: Rabbit, SierraCongestionD1D2 D3D4D2 D3D1 D42.1 Commandissuance2.2 TransferblockLow Gear High Gear1. Access metadata toidentify updated blocks
  • 12. Gear-shifting in current HDFS-based methods [7/10]12DataNode1DataNode4DataNode2DataNode3NameNodeWrite DatasetD = {D1, D2, D3, D4}DataNode1DataNode4DataNode2DataNode3NameNodeGear Up2. Transfer updatedblocksEg: Rabbit, SierraCongestionD1D2 D3D4D2 D3D1 D42.1 Commandissuance2.2 TransferblockLow Gear High Gear1. Access metadata toidentify updated blocks
  • 13. Gear-shifting in current HDFS-based methods [8/10]13DataNode1DataNode4DataNode2DataNode3NameNodeWrite DatasetD = {D1, D2, D3, D4}DataNode1DataNode4DataNode2DataNode3NameNodeGear Up2. Transfer updatedblocksEg: Rabbit, SierraCongestionD1D2 D3D4D2 D3D1 D42.1 Commandissuance2.2 TransferblockLow Gear High Gear1. Access metadata toidentify updated blocks
  • 14. Gear-shifting in current HDFS-based methods [9/10]14DataNode1DataNode4DataNode2DataNode3NameNodeWrite DatasetD = {D1, D2, D3, D4}DataNode1DataNode4DataNode2DataNode3NameNodeGear Up2. Transfer updatedblocksEg: Rabbit, SierraCongestionD1D2 D3D4D2 D3D1 D4D12.1 Commandissuance2.2 TransferblockLow Gear High Gear1. Access metadata toidentify updated blocks
  • 15. Gear-shifting in current HDFS-based methods [10/10]15DataNode1DataNode4DataNode2DataNode3NameNodeWrite DatasetD = {D1, D2, D3, D4}DataNode1DataNode4DataNode2DataNode3NameNodeGear Up2. Transfer updatedblocksEg: Rabbit, SierraSequentially(1 block/connection)CongestionInefficiencyD1D2 D3D4D2 D3D1 D4D1 D42.1 Commandissuance2.2 TransferblockLow Gear High Gear1. Access metadata toidentify updated blocks
  • 16. Goal and Approach Goal Propose a novel architecture for efficient gear-shifting forpower-proportional HDFS Approach Utilize distributed metadata management (MDM) Eliminate the bottleneck of the centralized MDM Coupling NameNode and DataNode (NDCouplingHDFS) Localize the range of updated blocks maintained by metadatamanagement Reduce the communication cost among nodes Enable multiple blocks transfer to improve the efficiency inHDFS16
  • 17. [Proposals]Distributed MDM Distribute MDM to multiple nodes to decentralize the load duringgear-shiftings Require a distributed MDM that is update conscious The MDM is transferred when the system shifts gears Low cost of search/insert/delete operations Inefficient distributed hash table based method For each transferred file, the hash function is needed to be applied Efficient range based method For a range of files, all the metadata can be transferred within a limitedstructure transverses Apply two range-based methods Each node statically maintains a separate subnamespace(Static Directory Partition-SDP) Parallel index technique with well concurrency control (Fat-Btree) [*]17[*] A Concurrency Control Protocol for Parallel B-tree structure withoutlatch-coupling for explosively growing digital content, EDBT 2008
  • 18. [Proposals]NDCouplingHDFS with Distributed MDM Each node maintains a subnamespace of the wholenamspace of the system The mapping information [Node, Range] is managed byDistributed MDM18DataManagementDistributedMDMND1DistributedMDMDataManagementND2DistributedMDMDataManagementND3DistributedMDMDataManagementND42. Forward request toresponsible nodes3. Serve the requestand return the results1. Sendrequest of 254. Return resultsA NDCoulingHDFSnodeND1: [1, 10]ND2: [11,20]ND3: [21, 30]ND4: [31,~]
  • 19. [Proposals]Efficient Gear-shifting [1/6]19DataManagementDistributedMDMDistributedMDMDataManagementDistributedMDMDataManagementDistributedMDMDataManagementA DB CA1 B1 C1 D1WOLLogWOLLogAA1DD1<File, Temp Node, Intended Node>Reactivated ReactivatedA1B1 C1D1A1 D1 The process isdistributed tomultiple nodes The commandissuance fromDisitributed MDMand DataManagement islocally performed Updated blocksare transferred inbatch way(multiple blocksper connection)
  • 20. [Proposals]Efficient Gear-shifting [2/6]20DataManagementDistributedMDMDistributedMDMDataManagementDistributedMDMDataManagementDistributedMDMDataManagementA DB CA1 B1 C1 D1WOLLogWOLLogAA1DD1<File, Temp Node, Intended Node>Reactivated ReactivatedA1B1 C1D1A1 D11. Transfer updatedmetadata1. Transfer updatedmetadata The process isdistributed tomultiple nodes The commandissuance fromDisitributed MDMand DataManagement islocally performed Updated blocksare transferred inbatch way(multiple blocksper connection)
  • 21. [Proposals]Efficient Gear-shifting [3/6]21DataManagementDistributedMDMDistributedMDMDataManagementDistributedMDMDataManagementDistributedMDMDataManagementA DB CA1 B1 C1 D1WOLLogWOLLogAA1DD1<File, Temp Node, Intended Node>Reactivated ReactivatedA1B1 C1D1A1 D11. Transfer updatedmetadata1. Transfer updatedmetadata The process isdistributed tomultiple nodes The commandissuance fromDisitributed MDMand DataManagement islocally performed Updated blocksare transferred inbatch way(multiple blocksper connection)2. Command issuance 2. Command issuance
  • 22. [Proposals]Efficient Gear-shifting [4/6]22DataManagementDistributedMDMDistributedMDMDataManagementDistributedMDMDataManagementDistributedMDMDataManagementA DB CA1 B1 C1 D1WOLLogWOLLogAA1DD1<File, Temp Node, Intended Node>Reactivated ReactivatedA1B1 C1D1A1 D11. Transfer updatedmetadata1. Transfer updatedmetadata The process isdistributed tomultiple nodes The commandissuance fromDisitributed MDMand DataManagement islocally performed Updated blocksare transferred inbatch way(multiple blocksper connection)2. Command issuance 2. Command issuance3. Transfer blocks3. Transfer blocks
  • 23. [Proposals]Efficient Gear-shifting [5/6]23DataManagementDistributedMDMDistributedMDMDataManagementDistributedMDMDataManagementDistributedMDMDataManagementA DB CA1 B1 C1 D1WOLLogWOLLogAA1DD1<File, Temp Node, Intended Node>Reactivated ReactivatedA1B1 C1D1A1 D11. Transfer updatedmetadata1. Transfer updatedmetadata The process isdistributed tomultiple nodes The commandissuance fromDisitributed MDMand DataManagement islocally performed Updated blocksare transferred inbatch way(multiple blocksper connection)2. Command issuance 2. Command issuance3. Transfer blocks3. Transfer blocks4. Updated metadata4. Updated metadata
  • 24. [Proposals]Efficient Gear-shifting [6/6]24DataManagementDistributedMDMDistributedMDMDataManagementDistributedMDMDataManagementDistributedMDMDataManagementA DB CA1 B1 C1 D1WOLLogWOLLogAA1DD1<File, Temp Node, Intended Node>Reactivated ReactivatedA1B1 C1D1A1 D11. Transfer updatedmetadata1. Transfer updatedmetadata The process isdistributed tomultiple nodes The commandissuance fromDisitributed MDMand DataManagement islocally performed Updated blocksare transferred inbatch way(multiple blocksper connection)2. Command issuance 2. Command issuance3. Transfer blocks3. Transfer blocks4. Updated metadata4. Updated metadataParallelismReducenetwork costEfficient blocktransfer
  • 25. Experiment Evaluation Experiment 1 Verify the effectiveness of proposals in gear-shiftingprocess by comparing with the normal HDFS Updated block reflection is the major cost Coupling architecture, batch block transferring Experiment 2 Evaluate the effectiveness of distributed indextechnique to NDCouplingHDFS SDP and Fat-Btree through changing the number of nodes25
  • 26. [Experiment 1]Validity of NDCouplingHDFS in Gear-shifting26Updated Data Reflection# Gears 2# Active nodes at Low Gear 8# Active nodes at HighGear16# files 16000File size 1MBHDFSVersion 0.20.2Maximum number oftransferred blocks100Heartbeat interval 1s Compare the execution time of updated datareflection the NDCouplingHDFS with the normalHDFS based on five configurations Combinations of architecture, distributed MDM (SDP,Fat-Btree), command issuance, block transfer Environment
  • 27. 051015202530354045010203040506070NormalHDFS SSS SBS SBB FBBExecution timeNumber of communication connections[commnand issuance][Experiment 1]Experimental Results2746%41%Configuration NormalHDFSSSS SBS SBB FBBArchitecture HDFS Coupling Coupling Coupling CouplingMDM Central SDP SDP SDP Fat-BtreeCommandissuanceSequential Sequential Batch Batch BatchBlocktransferenceSequential Sequential Sequential Batch BatchCoupling architecture andBatch block transferring highlyeffected the performance[s]
  • 28. [Experiment 2]Scalability of metadata operations Evaluate SDP vs. Fat-Btree Change the number of files and number of nodes28Machine# 1, 2, 4, 8CPU TM8600 1.0GHzMemory DRAM 4GBNIC 1000Mb/sOS Linux 3.0 64bitJava JDK-1.7.0Fat-BtreeFanout 16ControlConcurrencyLCFB [Yoshihara, 2007]Workload#files 3000File size 1MB
  • 29.  Fat-Btree gained better scalability when the number ofnodes increases The read throughput scaled well due to better search cost andconcurrency control The efficiency in write throughput is limited due to thesynchronization cost in updating tree structure[Experiment 2]Experimental Results290501001502002503003501 2 4 8SDPFat-Btree0510152025301 2 4 8SDPFBTReadThroughput[operation/s]WriteThroughput[operations/s]A transaction: open/create metadataand read/write files
  • 30. Conclusion Proposed NDCouplingHDFS for efficient gear-shifting inpower-proportional HDFS Significantly reduced at most 46% the execution time ofreflecting updated data compare with the normal HDFS Coupling architecture and batch block transferring Improved the IO performance by applying distributedindex technique to NDCouplingHDFS NDCouplingHDFS Maintains supporting MapReduce Exptected to achieve real power-proportionality includingpower consumption of metadata management30
  • 31. NameNode and DataNode Coupling for aPower-proportional Hadoop Distributed File SystemThank you for your attention!31