Inside HDFS Append

Inside HDFS APPEND
Yue Chen
http://linkedin.com/in/yuechen2
http://dataera.wordpress.com

HDFS Background
HDFS: Hadoop Distributed File System
Good for:
Large Files
Streaming Data Access
Bad for:
Lots of Small Files
Random Access

HDFS Architecture

HDFS Write

Before the birth of append, once a file is closed, it is immutable.
For database operations, it is expensive.
Solution:
Append Background

Before the birth of append, once a file is closed, it is immutable.
For database operations, it is expensive.
Solution:
Append Background
APPEND

Key for Designing Append
How to guarantee the consistency when something is wrong?

Key for Designing Append
How to guarantee the consistency when something is wrong?
Use more states!

States
Finalized:
Everything is done!

States
RBW (ReplicaBeingWritten):
In write’s pipeline, visible to read

States
RUR (ReplicaUnderRecovery):
Lease is expired, replica is under recovery

States
RWR (ReplicaWaitingToBeRecovered):
If one DN is down, all RBW becomes RWR

States
Temporary:
Replicas are transmitted between DN’s

Lease
What is a lease?
Write lock for file modification, Avoids concurrent write on the same file
No lease for reading files

Lease Expiration
Soft Limit
No renewing for 1 minute
Other client compete for the lease
Hard Limit
No renewing for 60 minutes
No competition for the lease

State
Name Node (NN) block, 4 types of states:
complete
under_construction
under_recovery
committed
Data Node (DN) replica, 5 types of states:
Finalized
RBW (ReplicaBeingWritten, in write’s pipeline, visible to read)
RUR (ReplicaUnderRecovery, lease is expired)
RWR (ReplicaWaitingToBeRecovered, if one DN is down, all RBW becomes RWR)
Temporary (being transmitted between DN’s)

Overview (Hadoop 1.0.0)

Overall Procedure
From the perspective of Client, append operation firstly calls append of DistributedFileSystem, this operation would return a stream object FSDataOutputStream out. If Client needs to append data to this file, it could calls out.write to write, and calls out.close to close.

write/append
1)Normal close
DFSOutputStream.close()->FSNamesystem.completeFile()- >commitOrCompleteLastBlock()
State of file in NN (Name Node) is INode, not INodeUnderConstruction.
2)Abnormal close
The state is INodeUnderConstruction. The lease (write lock) on the file is not released.
Lease recovery
Block recovery

Lease Recovery
When file is not normally closed, the last block’s 3 replicas may be in different states (size and generation stamp (version of the block)).
The recovery procedure includes checking if the previous lease holder renews the lease, and if the lease exceeds the softLimit (exceeds the time limit); if so, calls internalReleaseLease().

Block Recovery
Sent with DN’s heartbeat to NN.
Find the best state of all replicas, and recover the remaining to this state.
State Ranking: Finalized > RBW > RWR > RUR > Temporary
When finishing recovery, continues executing (append, write, etc.)

Reference
http://yanbohappy.sinaapp.com/?p=175
http://blog.csdn.net/chenpingbupt/article/details/7972589
http://hdfs-hadoop.blogspot.com/
http://blog.csdn.net/nexus/article/details/7321150

Inside HDFS Append

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to Inside HDFS Append

Similar to Inside HDFS Append (20)

More from Yue Chen

More from Yue Chen (9)

Recently uploaded

Recently uploaded (20)

Inside HDFS Append