2. HDFS Client Write Flow - Creating the OS
● Client calls DistributedFileSystem.create() method;
a. Performs an RPC call to NN to create the file in the image
b. Assuming file could be created, a DFSOutputStream instance will be created:
i. Setup basic properties for the write, such as block size, replication, enctype, caching
strategy, etc;
ii. Computes the chunk size (byte array buffer for each os.write);
iii. Creates the DataStreamer instance;
c. Starts the DataStreamer thread and return the DFSOutputStream instance to client;
4. HDFS Client Write Flow - Writing to the OS
● With the OS instance, client is ready to write data.
● As client writes bytes, OS buffer gets filled and bytes are flushed, as we can
see in FSOutputSummer.write1;
● While flushing, checksum is calculated for the chunk and written together
with the data: FSOutputSummer.writeCheckumChunks;
● At this point, data is not yet streamed over the wire, but grouped into another
structure called Packet. Several chunks will form a packet:
DFSOutputStream.writeChunk
● Once the max number of chunks for the packet is reached, the given packet is
ready to be transmitted.
● DFSOutputStream enqueues the packet to be processed by DataStreamer
asynchronously.
5. HDFS Client Write Flow - Writing to the OS
● First, accumulate
chunks of data into
packets, then
enqueues packets for
async processing
6. HDFS Client Write Flow - Writing to the OS
● DataStreamer thread started while DFSOutputStream was being created
keeps looking the queue of packets: DFSOutputStream.DataStreamer.run;
a. Reads packet from the queue;
b. If block construction is in init or recovery stage:
■ create a new block on the image (via NN ADD_BLOCK RPC call);
■ opens an OS to the first DN in the pipeline;
● setup the Socket for first DN;
● create the OS instance;
● create a "Sender" instance wrapping the OS and send a WRITE_BLOCK OP (this
is an initial message to establish the stream between Client and DN. No actual
data sent yet;
■ creates ResponseProcessor and set stage to DATA_STREAMING;
c. If last packet for this block, set stage to PIPELINE_CLOSE;
d. Finally write data to the OS.
8. HDFS Client Write Flow - DN side
● At DN side, DataXceiverServer is responsible to accept an incoming socket connection. It then
spawns a DataXceiver thread.
● DataXceiver extends Receiver in order to parse DataTransfer operations. A receiver is the
counterpart of Sender, with logics to parse the DT protobuff message. Here it discovers the type of
OP.
● Since this a WRITE example, DataXceiver processes the WRITE_BLOCK op;
● TCP server handles client connections and delegates the processing to its DataXceiver threads.
● DataXceiver uses BlockReceiver, a helper class to read packets from the socket IS and also
forward it to the next DN in the pipeline.
● Assuming everything went smooth, BlockReceiver writes data to mirror and local disk. It will also
perform checksum verification and write the block checksum info.
● If last packet for the block, it will also trigger a finalize block OP to NN.
● DataXceiver then generates a response and write back to the socket OS.
10. HDFS Client Read Flow - Creating the IS
● Client calls DistributedFileSystem.open() method;
a. It uses DFSClient to create a DFSInputStream instance and wrap into a FSDataInputStream.
b. DFSInputStream initialisation locates the file blocks (via RPC call to NN), then sets up several
info regarding block locations, lengths, etc, which will all be needed during the client read.
c. That's enough to have the DFSInputStream instance created and returned to the client
11. HDFS Client Read Flow - Reading from the IS
● With an IS instance available, clients can start reading bytes from the given
file.
● Inside DFSInputStream.read() method, it checks if it's the first block or a if
current block is done, to decide if it has to create a new instance of
BlockReader.
○ If that's the case, it locates DNs holding the block and uses BlockReaderFactory to build a
new instance of BlockReader.
● BlockReader interface defines the contract for interacting with DNs data
directly. It currently has 4 implementations:
○ BlockReaderLocal and BlockReaderLocalLegacy (these are used for SCR);
○ RemoteBlockReader (deprecated) and RemoteBlockReader2 (these are used for remote
reads);
12. HDFS Client Read Flow - Reading from the IS
● It's also possible to plug in custom implementations to be used by client read
operations.
● We'll focus on client remote reads:
○ BlockReaderFactory creates an instance of RemoteBlockReader.
○ While being initialized, BlockReaderFactory stablishes the TCP Socket connection within a
DN and pass it to RemoteBlockReader2.
○ Using same Sender class from previous slides write examples, it sends a READ_BLOCK
operation to the DN.
○ It then obtains an IS from the TCP Socket and wraps it for reading block data.
○ The RemoteBlockReader2 instance is then returned to DFSInputStream.
○ DFSInputStream calls RemoteBlockReader2.read() method, which in turn uses
PacketReceiver.readNextPacket() method to read block data in packets from the TPC
Socket IS.
14. HDFS Client Read Flow - DN side
● TCP Connection creation and DT parsing logic is the same as from slide #8
for Write operation.
● Since this is READ_BLOCK op, DataXceiver uses BlockSender, a helper
class to read packets from disk or memory cache, then write back to the TCP
OS.
● Using BlockSender info about the block, DataXceiver verifies the checksum
and send DT message back to client.
● DataXceiver calls BlockSender.sendBlock to effectively read block data
and send back to client via the TCP stream.
● BlockSender implements logic specifics for caching, packet read control and
stream writing.