April 13, 2015
In the last post we discussed the architecture and the features of HDFS . In this post I will try to address how HDFS is able to achieve the high data availability, high data transfers and robustness despite the hardware failures.
When a client submits a file to be stored on HDFS, NameNode internally accepts the request and breaks down the file into blocks.
Say: The input file size is 300 MB. The standard blocksize used by HDFS is 64MB. So the input file is divided into 5 blocks out of which the first 4 blocks are of equal size (64MB) and the size of the last block depends on the leftover chunk of input file.
Each of those blocks will be replicated and distributed across different nodes to avoid the danger of losing the blocks due to hardware failures. Clients can specify the replication factor for the file when he stores it into HDFS.
Img source: https://developer.yahoo.com/hadoop/
Fig: DataNodes holding blocks of multiple files with a replication factor of 2. The NameNode maps the filenames onto the block ids.
Q: What is the technique employed by NameNodes to distribute the blocks onto DataNodes?
Answer: Rack Awareness.
In a datacenter, machines are arranged across racks. In each rack, there are multiple machines. Data transfers within the rack is much faster than the data transfer across the racks (Because of the overhead of data going through the internal rack switches and then across-the-rack-switch). When you install the Hadoop software, you need to specify the network topology. This is a manual admin task. NameNode uses this information to distribute the blocks. If the replication factor is 3, the policy is to put one replica on the local node in the same rack, another on a node in a different remote rack and the last on a different node in the same remote rack.
Q: How are high data transfers achieved?
Answer: Since the same block of data is replicated across various machines, the data can be read in parallel. However the writes become expensive. That’s the reason why HDFS is more suitable for write-once-read-any number of times (WORA) setup.
So far, we have looked at how HDFS achieves high data availability despite failures, let us now shift our radar to see how HDFS achieves the robustness.
Robustness: In this section, we will see how HDFS stores the data reliably in the presence of failures. The three common type of failures are as follows:
i) Data Node Failures: DataNodes sends heart beat messages back to NameNode every x (say 3) seconds to indicate that they are up and running. They also send the Block Report containing all the blocks that are stored on them to the NameNode. Based on the Block Reports received from the DataNodes and theNameNodes will update its metadata as required. If the NameNode does not receive these messages, it detects the failure of the DataNodes and marks these nodes as dead. Moving forward, it does not forward any requests to these nodes. The NameNodes may also start re-replication of the blocks as required due to the death of DataNodes.
ii) Data Corruption in DataNodes: NameNode computes the checksum of the blocks it distributes to the DataNodes and stores this information in a hidden file in a namespace. It computes the checksum again when it retrieves the data from DataNodes. If the checksum does not match,the NameNode detects the block corruption and then initiates the data transfer from some other DataNode.
iii) NameNode Failure: NameNode machine is a single point of failure for HDFS. If the NameNode fails, then manual intervention is required.
In this post, we saw how HDFS achieves high data availability, high data transfers and robustness despite hardware failures due to commodity hardware. I hope you had a good understanding of HDFS. In the next post, we will discuss about configuring Hadoop in a SingleNode/MultiNode setup.