Create directories and configure ownership + permissions on the appropriate hosts as described below. If any of these directories already exist, we recommend deleting and recreating them.
Use the following instructions to create appropriate directories:
We strongly suggest that you edit and source the files included in
scripts.zipfile (downloaded in Download Companion Files).Alternatively, you can also copy the contents to your
~/.bash_profile) to set up these environment variables in your environment.
On the node that hosts the NameNode service, execute the following commands:
mkdir -p $DFS_NAME_DIR chown -R $HDFS_USER:$HADOOP_GROUP $DFS_NAME_DIR chmod -R 755 $DFS_NAME_DIR
where:
$DFS_NAME_DIRis the space separated list of directories where NameNode stores the file system image. For example,/grid/hadoop/hdfs/nn /grid1/hadoop/hdfs/nn.$HDFS_USERis the user owning the HDFS services. For example,hdfs.$HADOOP_GROUPis a common group shared by services. For example,hadoop.
On all the nodes that can potentially host the SecondaryNameNode service, execute the following commands:
mkdir -p $FS_CHECKPOINT_DIR chown -R $HDFS_USER:$HADOOP_GROUP $FS_CHECKPOINT_DIR chmod -R 755 $FS_CHECKPOINT_DIR
where:
$FS_CHECKPOINT_DIRis the space separated list of directories where SecondaryNameNode should store the checkpoint image. For example,/grid/hadoop/hdfs/snn /grid1/hadoop/hdfs/snn.$HDFS_USERis the user owning the HDFS services. For example,hdfs.$HADOOP_GROUPis a common group shared by services. For example,hadoop.
On all DataNodes, execute the following commands:
mkdir -p $DFS_DATA_DIR chown -R $HDFS_USER:$HADOOP_GROUP $DFS_DATA_DIR chmod -R 750 $DFS_DATA_DIR
On the JobTracker and all Datanodes, execute the following commands:
mkdir -p $MAPREDUCE_LOCAL_DIR chown -R $MAPRED_USER:$HADOOP_GROUP $MAPREDUCE_LOCAL_DIR chmod -R 755 $MAPREDUCE_LOCAL_DIR
where:
$DFS_DATA_DIRis the space separated list of directories where DataNodes should store the blocks. For example,/grid/hadoop/hdfs/dn /grid1/hadoop/hdfs/dn.$HDFS_USERis the user owning the HDFS services. For example,hdfs.$MAPREDUCE_LOCAL_DIRis the space separated list of directories where MapReduce should store temporary data. For example,/grid/hadoop/mapred /grid1/hadoop/mapred /grid2/hadoop/mapred.$MAPRED_USERis the user owning the MapReduce services. For example,mapred.$HADOOP_GROUPis a common group shared by services. For example,hadoop.
On all nodes, execute the following commands:
mkdir -p $HDFS_LOG_DIR chown -R $HDFS_USER:$HADOOP_GROUP $HDFS_LOG_DIR chmod -R 755 $HDFS_LOG_DIR
mkdir -p $MAPRED_LOG_DIR chown -R $MAPRED_USER:$HADOOP_GROUP $MAPRED_LOG_DIR chmod -R 755 $MAPRED_LOG_DIR
mkdir -p $HDFS_PID_DIR chown -R $HDFS_USER:$HADOOP_GROUP $HDFS_PID_DIR chmod -R 755 $HDFS_PID_DIR
mkdir -p $MAPRED_PID_DIR chown -R $MAPRED_USER:$HADOOP_GROUP $MAPRED_PID_DIR chmod -R 755 $MAPRED_PID_DIR
where:
$HDFS_LOG_DIRis the directory for storing the HDFS logs.This directory name is a combination of a directory and the
$HDFS_USER. For example,/var/log/hadoop/hdfswherehdfsis the$HDFS_USER.$HDFS_PID_DIRis the directory for storing the HDFS process ID.This directory name is a combination of a directory and the
$HDFS_USER. For example,/var/run/hadoop/hdfswherehdfsis the$HDFS_USER.$MAPRED_LOG_DIRis the directory for storing the MapReduce logs.This directory name is a combination of a directory and the
$MAPRED_USER. For example,/var/log/hadoop/mapredwheremapredis the$MAPRED_USER.$MAPRED_PID_DIRis the directory for storing the MapReduce process ID.This directory name is a combination of a directory and the
$MAPRED_USER. For example,/var/run/hadoop/mapredwheremapredis the$MAPRED_USER.

