Datanodes
Although DataNodes do not contain metadata about the directories and files stored in an HDFS cluster, they do contain a small amount of metadata about the DataNode itself and its relationship to a cluster. This shows the output of running the tree command on the DataNode’s directory, configured by setting dfs.datanode.data.dir in hdfs-site.xml.
data/dfs/data/ ├── current │ ├── BP-1079595417-192.168.2.45-1412613236271 │ │ ├── current │ │ │ ├── VERSION │ │ │ ├── finalized │ │ │ │ └── subdir0│ │ │ │ └── subdir1 │ │ │ │ ├── blk_1073741825 │ │ │ │ └── blk_1073741825_1001.meta │ │ │ │── lazyPersist │ │ │ └── rbw │ │ ├── dncp_block_verification.log.curr │ │ ├── dncp_block_verification.log.prev │ │ └── tmp │ └── VERSION └── in_use.lock
The purpose of these files are as follows:
- BP-random integer-NameNode-IP address-creation time
- Top level directory for datanodes. The naming convention for this directory is significant and constitutes a form of cluster metadata. The name is a block pool ID. “BP” stands for “block pool,” the abstraction that collects a set of blocks belonging to a single namespace. In the case of a federated deployment, there are multiple “BP” sub-directories, one for each block pool. The remaining components form a unique ID: a random integer, followed by the IP address of the NameNode that created the block pool, followed by creation time. 
- VERSION
- Text file containing multiple properties, such as layoutVersion, clusterId and cTime, which is much like the NameNode and JournalNode. There is a VERSION file tracked for the entire DataNode as well as a separate VERSION file in each block pool sub-directory. - In addition to the properties already discussed earlier, the DataNode’s VERSION files also contain: - storageType
- storageTypefield is set to- DATA_NODE.
- blockpoolID
- Repeats the block pool ID information encoded into the sub-directory name. 
 
- finalized/rbw
- Both - finalizedand- rbwcontain a directory structure for block storage. This holds numerous block files, which contain HDFS file data and the corresponding .meta files, which contain checksum information.- rbwstands for “replica being written”. This area contains blocks that are still being written to by an HDFS client. The finalized sub-directory contains blocks that are not being written to by a client and have been completed.
- lazyPersist
- HDFS is incorporating a new feature to support writing transient data to memory, followed by lazy persistence to disk in the background. If this feature is in use, then a lazyPersist sub-directory is present and used for lazy persistence of in-memory blocks to disk. We’ll cover this exciting new feature in greater detail in a future blog post. 
- scanner.cursor
- File to which the "cursor state" is saved. - The DataNode runs a block scanner which periodically does checksum verification of each block file on disk. This scanner maintains a "cursor," representing the last block to be scanned in each block pool slice on the volume, and called the "cursor state." 
- in_use.lock
- Lock file held by the DataNode process, used to prevent multiple DataNode processes from starting up and concurrently modifying the directory. 

