Use the following instructions to manually add a DataNode or a TaskTracker hosts:
On each of the newly added slave nodes, add the HDP repository to yum:
wget -nv //public-repo-1.hortonworks.com/HDP/repos/centos6/hdp.repo -O /etc/yum.repos.d/hdp.repo yum clean all
On each of the newly added slave nodes, install HDFS and MapReduce.
On RHEL and CentOS:
yum install hadoop hadoop-libhdfs hadoop-native yum install hadoop-pipes hadoop-sbin openssl
On SLES:
zypper install hadoop hadoop-libhdfs hadoop-native zypper install hadoop-pipes hadoop-sbin openssl
On each of the newly added slave nodes, install Snappy compression/decompression library:
Check if Snappy is already installed:
rpm-qa | grep snappy
Install Snappy on the new nodes:
For RHEL/CentOS:
yum install snappy snappy-devel
For SLES:
zypper install snappy snappy-devel
ln -sf /usr/lib64/libsnappy.so /usr/lib/hadoop/lib/native/Linux-amd64-64/.
Optional - Install the LZO compression library.
On RHEL and CentOS:
yum install lzo-devel hadoop-lzo-native
On SLES:
zypper install lzo-devel hadoop-lzo-native
Copy the Hadoop configurations to the newly added slave nodes and set appropriate permissions.
Option I: Copy Hadoop config files from an existing slave node.
On an existing slave node, make a copy of the current configurations:
tar zcvf hadoop_conf.tgz /etc/hadoop/conf
Copy this file to each of the new nodes:
rm -rf /etc/hadoop/conf cd / tar zxvf $location_of_copied_conf_tar_file/hadoop_conf.tgz chmod -R 755 /etc/hadoop/conf
Option II: Manually add Hadoop configuration files.
Download core Hadoop configuration files from here and extract the files under
configuration_files -> core_hadoopdirectory to a temporary location.In the temporary directory, locate the following files and modify the properties based on your environment. Search for TODO in the files for the properties to replace.
Table 6.1. core-site.xml Property Example Description fs.default.name hdfs://{namenode.full.hostname}:8020Enter your NameNode hostname fs.checkpoint.dir /grid/hadoop/hdfs/snnA comma separated list of paths. Use the list of directories from $FS_CHECKPOINT_DIR..Table 6.2. hdfs-site.xml Property Example Description dfs.name.dir /grid/hadoop/hdfs/nn,/grid1/hadoop/hdfs/nnComma separated list of paths. Use the list of directories from $DFS_NAME_DIRdfs.data.dir /grid/hadoop/hdfs/dn,grid1/hadoop/hdfs/dnComma separated list of paths. Use the list of directories from $DFS_DATA_DIRdfs.http.address {namenode.full.hostname}:50070Enter your NameNode hostname for http access dfs.secondary.http.address {secondary.namenode.full.hostname}:50090Enter your SecondaryNameNode hostname dfs.https.address {namenode.full.hostname}:50470Enter your NameNode hostname for https access. Table 6.3. mapred-site.xml Property Example Description mapred.job.tracker {jobtracker.full.hostname}:50300Enter your JobTracker hostname mapred.job.tracker.http.address {jobtracker.full.hostname}:50030Enter your JobTracker hostname mapred.local.dir /grid/hadoop/mapred,/grid1/hadoop/mapredComma separated list of paths. Use the list of directories from $MAPREDUCE_LOCAL_DIRmapreduce.tasktracker.group hadoopEnter your group. Use the value of $HADOOP_GROUPmapreduce.history.server.http.address {jobtracker.full.hostname}:51111Enter your JobTracker hostname Table 6.4. taskcontroller.cfg Property Example Description mapred.local.dir /grid/hadoop/mapred,/grid1/hadoop/mapredComma separated list of paths. Use the list of directories from $MAPREDUCE_LOCAL_DIRCreate the config directory on all hosts in your cluster, copy in all the configuration files, and set permissions.
rm -r $HADOOP_CONF_DIR mkdir -p $HADOOP_CONF_DIR
<copy the all the config files to $HADOOP_CONF_DIR>
chmod a+x $HADOOP_CONF_DIR/ chown -R $HDFS_USER:$HADOOP_GROUP $HADOOP_CONF_DIR/../ chmod -R 755 $HADOOP_CONF_DIR/../
where:
$HADOOP_CONF_DIRis the directory for storing the Hadoop configuration files. For example,/etc/hadoop/conf.$HDFS_USERis the user owning the HDFS services. For example,hdfs.$HADOOP_GROUPis a common group shared by services. For example,hadoop.
On each of the newly added slave nodes, start HDFS:
su -hdfs /usr/lib/hadoop/bin/hadoop-daemon.sh --config $HADOOP_CONF_DIR start datanode
On each of the newly added slave nodes, start MapReduce:
su -mapred /usr/lib/hadoop/bin/hadoop-daemon.sh --config $HADOOP_CONF_DIR start tasktracker
Add new slave nodes.
To add a new NameNode slave (DataNode):
On the NameNode host machine, edit the
/etc/hadoop/conf/dfs.includefile and add the list of slave nodes' hostnames (separated by newline character).![[Important]](../common/images/admon/important.png)
Important Ensure that you create a new
dfs.includefile, if the NameNode host machine does not have an existing copy of this file.On the NameNode host machine, execute the following command:
su – hdfs –c “hadoop dfsadmin –refreshNodes”
To add a new JobTracker slave (TaskTracker):
One the JobTracker host machine, edit the
/etc/hadoop/conf/mapred.includefile and add the list of slave nodes' hostnames (separated by newline character).![[Important]](../common/images/admon/important.png)
Important Ensure that you create a new
mapred.includefile, if the JobTracker host machine does not have an existing copy of this file.On the JobTracker host machine, execute the following command:
su – mapred –c “hadoop mradmin –refreshNodes”
Optional - Enable monitoring on the newly added slave nodes using the instructions provided here.
Optional - Enable cluster alerting on the newly added slave nodes using the instructions provided here.

