Migrate the HDP Configurations
Configurations and configuration file names have changed between HDP 1.3.2 (Hadoop 1.2.x) and HDP 2.1 (Hadoop 2.4). To upgrade to HDP 2.x, back up your current configuration files, download the new HDP 2.1 files, and compare. The following tables provide mapping information to make the comparison between releases easier.
To migrate the HDP Configurations
Back up the following HDP 1.x configurations on all nodes in your clusters.
/etc/hadoop/conf
/etc/hbase/conf
/etc/hcatalog/conf (Note: With HDP 2.1, /etc/hcatalog/conf is divided into /etc/hive- hcatalog/conf and /etc/hive-webhcat.You cannot use /etc/ hcatalog/conf in HDP 2.1.)
/etc/hive/conf
/etc/pig/conf
/etc/sqoop/conf
/etc/flume/conf
/etc/mahout/conf
/etc/oozie/conf
Edit /etc/hadoop/conf/core-site.xml and set hadoop.rpc.protection from none to authentication.
![[Note]](../common/images/admon/note.png)
Note Hadoop lets cluster administrators control the quality of protection in the configuration parameter “hadoop.rpc.protection” in core-site.xml. It is an optional parameter in HDP 2.2. If not present, the default QOP setting of “auth” is used, which implies “authentication only”.
Valid values for this parameter are: “authentication” : Corresponds to “auth” “integrity” : Corresponds to “auth-int” “privacy” : Corresponds to “auth-conf”
The default setting is authentication-only because integrity checks and encryption are a performance cost.
Copy your /etc/hcatalog/conf configurations to /etc/hive-hcatalog/conf and /etc/hive-webhcat as appropriate.
Copy log4j.properties from the hadoop config directory of the companion files to /etc/hadoop/conf. The file should have owners and permissions similar to other files in /etc/hadoop/conf.
Download the your HDP 2.x companion files (see "Download the Companion Files" in Chapter 1 of the Manual Install Guide) and migrate your HDP 1.x configuration.
Copy these configurations to all nodes in your clusters.
/etc/hadoop/conf
/etc/hbase/conf
/etc/hcatalog/conf
/etc/hive/conf
/etc/pig/conf
/etc/sqoop/conf
/etc/flume/conf
/etc/mahout/conf
/etc/oozie/conf
/etc/zookeeper/conf
![[Note]](../common/images/admon/note.png)
Note Upgrading the repo using
yumorzypperresets all configurations. Prepare to replace these configuration directories each time you perform a yum or zypper rmgrade.Review the following HDP 1.3.2 Hadoop Core configurations and the new configurations or locations in HDP 2.x.
Table 3.4. HDP 1.3.2 Hadoop Core Site (core-site.xml)
HDP 1.3.2 config HDP 1.3.2 config file HDP 2.2 config HDP 2.2 config file fs.default.namecore-site.xml
fs.defaultFScore-site.xml
fs.checkpoint.dircore-site.xml
dfs.namenode. checkpoint.dirhdfs-site.xml
fs.checkpoint.edits. dircore-site.xml
dfs.namenode. checkpoint.edits.dirhdfs-site.xml
fs.checkpoint.periodcore-site.xml
dfs.namenode. checkpoint.periodhdfs-site.xml
io.bytes.per. checksumcore-site.xml
dfs.bytes-per-checksumhdfs-site.xml
dfs.df.intervalhdfs-site
fs.df.intervalcore-site.xml
hadoop.native.libcore-site.xml
io.native.lib. availablecore-site.xml
hadoop.configured. node.mapping--
net.topology. configured.node. mappingcore-site.xml
topology.node. switch.mapping.implcore-site-xml
net.topology.node. switch.mapping.implcore-site.xml
topology-script. file.namecore-site.xml
net.topology.script. file.namecore-site.xml
topology.script. number.argscore-site.xml
net.topology.script. number.argscore-site.xml
![[Note]](../common/images/admon/note.png)
Note The
hadoop.rpc.protectionconfiguration property in core- site.xml needs to specify authentication, integrity and/or privacy. No value defaults to authentication, but an invalid value such as "none" causes an error.Review the following 1.3.2 HDFS site configurations and their new configurations and files in HDP 2.x.
Table 3.5. HDP 1.3.2 Hadoop Core Site (hdfs-site.xml)
HDP 1.3.2 config HDP 1.3.2 config file HDP 2.2 config HDP 2.2 config file dfs.block.sizehdfs-site.xml
dfs.blocksizehdfs-site.xml
dfs.write.packet.sizehdfs-site.xml
dfs.client-write-packet-sizehdfs-site.xml
dfs.https.client. keystore.resourcehdfs-site.xml
dfs.client.https. keystore.resourcehdfs-site.xml
dfs.https.need. client.authhdfs-site.xml
dfs.client.https. need-authhdfs-site.xml
dfs.read.prefetch. sizehdfs-site.xml
dfs.bytes-per-checksumhdfs-site.xml
dfs.socket.timeouthdfs-site.xml
dfs.client.socket-timeouthdfs-site.xml
dfs.balance. bandwidthPerSechdfs-site.xml
dfs.datanode.balance. bandwidthPerSechdfs-site.xml
dfs.data.dirhdfs-site.xml
dfs.datanode.data.dirhdfs-site.xml
dfs.datanode.max. xcievershdfs-site.xml
dfs.datanode.max. transfer.threadshdfs-site.xml
session.idhdfs-site.xml
dfs.metrics.session-idhdfs-site.xml
dfs.access.time. precisionhdfs-site.xml
dfs.namenode. accesstime.precisionhdfs-site.xml
dfs.backup.addresshdfs-site.xml
dfs.namenode.backup. addresshdfs-site.xml
dfs.backup.http. addresshdfs-site.xml
dfs.namenode.backup. http-addresshdfs-site.xml
fs.checkpoint.dirhdfs-site.xml
dfs.namenode. checkpoint.dirhdfs-site.xml
fs.checkpoint. edits.dirhdfs-site.xml
dfs.namenode. checkpoint.edits.dirhdfs-site.xml
fs.checkpoint.periodhdfs-site.xml
dfs.namenode. checkpoint.periodhdfs-site.xml
dfs.name.edits.dirhdfs-site.xml
dfs.namenode. edits.dirhdfs-site.xml
heartbeat.recheck. intervalhdfs-site.xml
dfs.namenode. heartbeat.recheck-intervalhdfs-site.xml
dfs.http.addresshdfs-site.xml
dfs.namenode.http-addresshdfs-site.xml
dfs.https.addresshdfs-site.xml
dfs.namenode.https-addresshdfs-site.xml
dfs.max.objectshdfs-site.xml
dfs.namenode.max. objectshdfs-site.xml
dfs.name.dirhdfs-site.xml
dfs.namenode. name.dirhdfs-site.xml
dfs.name.dir. restorehdfs-site.xml
dfs.namenode.name. dir.restorehdfs-site.xml
dfs.replication. considerLoadhdfs-site.xml
dfs.namenode. replication. considerLoadhdfs-site.xml
dfs.replication. intervalhdfs-site.xml
dfs.namenode. replication.intervalhdfs-site.xml
dfs.max-repl-streamshdfs-site.xml
dfs.namenode. replication. max-streamshdfs-site.xml
dfs.replication.minhdfs-site.xml
dfs.namenode. replication. minhdfs-site.xml
dfs.replication. pending.timeout.sechdfs-site.xml
dfs.namenode. replication. pending.timeout-sechdfs-site.xml
dfs.safemode. extensionhdfs-site.xml
dfs.namenode. safemode. extensionhdfs-site.xml
dfs.safemode. threshold.pcthdfs-site.xml dfs.namenode. secondary. threshold-pctdfs.secondary. http.addresshdfs-site.xml
dfs.namenode. secondary.http-addresshdfs-site.xml
dfs.permissionshdfs-site.xml
dfs.permissions. enabledhdfs-site.xml
dfs.permissions. supergrouphdfs-site.xml
dfs.permissions. superusergrouphdfs-site.xml
dfs.df.intervalhdfs-site.xml
fs.df.intervalcore-site.xml
dfs.umaskmodehdfs-site.xml
fs.permissions. umask-modehdfs-site.xml
Review the following HDP 1.3.2 MapReduce Configs and their new HDP 2.x mappings.
Table 3.6. HDP 1.3.2 Configs now in Capacity Scheduler for HDP 2.x (mapred-site.xml)
HDP 1.3.2 config HDP 1.3.2 config file HDP 2.2 config HDP 2.2 config file mapred.map.child. java.optsmapred-site.xmlmapreduce.map. java.optsmapred-site.xmlmapred.job.map. memory.mbmapred-site.xmlmapred.job.map. memory.mbmapred-site.xmlmapred.reduce.child. java.optsmapred-site.xmlmapreduce.reduce. java.optsmapred-site.xmlmapreduce.job.reduce. memory.mbmapred-site.xmlmapreduce.reduce. memory.mbmapred-site.xmlsecurity.task. umbilical. protocol.aclmapred-site.xmlsecurity.job.task. protocol.aclmapred-site.xmlReview the following HDP 1.3.2 Configs and their new HDP 2.x Capacity Scheduler mappings.
Table 3.7. HDP 1.3.2 Configs now in capacity scheduler for HDP 2.x (capacity-scheduler.xml)
HDP 1.3.2 config HDP 1.3.2 config file HDP 2.2 config HDP 2.2 config file mapred.queue.namesmapred-site.xmlyarn.scheduler. capacity.root.queuescapacity-scheduler.xmlmapred.queue.default. acl-submit.jobmapred-queue-acls.xmlyarn.scheduler. capacity.root. default.acl_ submit_jobscapacity-scheduler.xmlmapred.queue.default. acl.administer-jobsmapred-queue-acls.xmlyarn.scheduler. capacity.root.default. acl_administer_jobscapacity-scheduler.xmlmapred.capacity-scheduler. queue.default. capacitycapacity-scheduler.xmlyarn-scheduler.capacity. root.default. capacitycapacity-scheduler.xmlmapred.capacity-scheduler. queue.default.user-limit-factorcapacity-scheduler.xmlyarn.scheduler. capacity.root.default. user-limit-factorcapacity-scheduler.xmlmapred.capacity-scheduler.queue. default.maximum-capacitycapacity-scheduler.xmlyarn.scheduler. capacity.root.default. maximum-capacitycapacity-scheduler.xmlmapred.queue. default.statecapacity-scheduler.xmlyarn.scheduler. capacity.root. default.statecapacity-scheduler.xmlCompare the following HDP 1.3.2 configs in hadoop-env.sh with the new configs in HDP 2.x.
Paths have changed in HDP 2.2 to /usr/hdp/current. You must remove lines such as:
export JAVA_LIBRARY_PATH=/usr/lib/hadoop/lib/native/Linux-amd64-64Table 3.8. HDP 1.3.2 Configs and HDP 2.x for hadoop-env.sh
HDP 1.3.2 config HDP 2.2 config Description JAVA_HOMEJAVA_HOMEJava implementation to use
HADOOP_HOME_WARN_SUPPRESSHADOOP_HOME_WARN_SUPPRESS--
HADOOP_CONF_DIRHADOOP_CONF_DIRHadoop configuration directory
not in hadoop-env.sh.
HADOOP_HOME-- not in hadoop-env.sh.
HADOOP_LIBEXEC_DIR--
HADOOP_NAMENODE_INIT_ HEAPSIZEHADOOP_NAMENODE_INIT_ HEAPSIZE--
HADOOP_OPTSHADOOP_OPTSExtra Java runtime options; empty by default
HADOOP_NAMENODE_OPTSHADOOP_NAMENODE_OPTSCommand-specific options appended to HADOOP-OPTS
HADOOP_JOBTRACKER_OPTSnot in hadoop-env.sh.
Command-specific options appended to HADOOP-OPTS
HADOOP_TASKTRACKER_OPTSnot in hadoop-env.sh.
Command-specific options appended to HADOOP-OPTS
HADOOP_DATANODE_OPTSHADOOP_DATANODE_OPTSCommand-specific options appended to HADOOP-OPTS
HADOOP_BALANCER_OPTSHADOOP_BALANCER_OPTSCommand-specific options appended to HADOOP-OPTS
HADOOP_SECONDARYNAMENODE_ OPTSHADOOP_SECONDARYNAMENODE_ OPTSCommand-specific options appended to HADOOP-OPTS
HADOOP_CLIENT_OPTSHADOOP_CLIENT_OPTSApplies to multiple commands (fs, dfs, fsck, distcp, etc.)
HADOOP_SECURE_DN_USERnot in hadoop-env.sh.
Secure datanodes, user to run the datanode as
HADOOP_SSH_OPTSHADOOP_SSH_OPTSExtra ssh options.
HADOOP_LOG_DIR
HADOOP_LOG_DIRDirectory where log files are stored in the secure data environment.
HADOOP_SECURE_DN_LOG_DIR
HADOOP_SECURE_DN_LOG_DIRDirectory where pid files are stored; /tmp by default.
HADOOP_PID_DIR
HADOOP_PID_DIRDirectory where pid files are stored, /tmp by default.
HADOOP_SECURE_DN_PID_DIR
HADOOP_SECURE_DN_PID_DIRDirectory where pid files are stored, /tmp by default.
HADOOP_IDENT_STRING
HADOOP_IDENT_STRINGString representing this instance of hadoop. $USER by default
not in hadoop-env.sh.
HADOOP_MAPRED_LOG_DIR--
not in hadoop-env.sh.
HADOOP_MAPRED_PID_DIR--
not in hadoop-env.sh.
JAVA_LIBRARY_PATH--
not in hadoop-env.sh.
JSVC_HOMEFor starting the datanode on a secure cluster
![[Note]](../common/images/admon/note.png)
Note Some of the configuration settings refer to the variable HADOOP_HOME. The value of HADOOP_HOME is automatically inferred from the location of the startup scripts. HADOOP_HOME is the parent directory of the bin directory that holds the Hadoop scripts. In many instances this is $HADOOP_INSTALL/hadoop.
Add the following properties to the yarn-site.xml file:
<property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity. CapacityScheduler</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>$resourcemanager.full.hostname:8025</value> <description>Enter your ResourceManager hostname.</description> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>$resourcemanager.full.hostname:8030</value> <description>Enter your ResourceManager hostname.</description> </property> <property> <name>yarn.resourcemanager.address</name> <value>$resourcemanager.full.hostname:8050</value> <description>Enter your ResourceManager hostname.</description> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>$resourcemanager.full.hostname:8141</value> <description>Enter your ResourceManager hostname.</description> </property> <property> <name>yarn.nodemanager.local-dirs</name> <value>/grid/hadoop/yarn/local,/grid1/hadoop/yarn/local</value> <description>Comma-separated list of paths. Use the list of directories from $YARN_LOCAL_DIR.For example, /grid/hadoop/yarn/local,/grid1/hadoop/yarn/local.</description> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>/grid/hadoop/yarn/log</value> <description>Use the list of directories from $YARN_LOCAL_LOG_DIR.For example, /grid/hadoop/yarn/log,/grid1/hadoop/yarn/log,/grid2/hadoop/yarn/log</description> </property> <property> <name>yarn.log.server.url</name> <value>http://$jobhistoryserver.full.hostname:19888/jobhistory/logs/</ value> <description>URL for job history server</description> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>$resourcemanager.full.hostname:8088</value> <description>URL for job history server</description> </property> <property> <name>yarn.nodemanager.admin-env</name> <value>MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX</value> <description>Restrict the number of memory arenas to prevent excessive VMEM use by the glib arena allocator. For example, MALLOC_ARENA_MAX=4</description> </property>Add the following properties to the yarn-site.xml file:
<property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity. CapacityScheduler</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>$resourcemanager.full.hostname:8025</value> <description>Enter your ResourceManager hostname.</description> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>$resourcemanager.full.hostname:8030</value> <description>Enter your ResourceManager hostname.</description> </property> <property> <name>yarn.resourcemanager.address</name> <value>$resourcemanager.full.hostname:8050 </value><description>Enter your ResourceManager hostname. </description></property> <property> <name>yarn.resourcemanager.admin.address</name> <value>$resourcemanager.full.hostname:8141</value> <description>Enter your ResourceManager hostname.</description> </property> <property> <name>yarn.nodemanager.local-dirs</name> <value>/grid/hadoop/yarn/local,/grid1/hadoop/yarn/local</value> <description>Comma separated list of paths. Use the list of directories from $YARN_LOCAL_DIR. For example, /grid/hadoop/yarn/local,/grid1/hadoop/yarn/local. </description> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>/grid/hadoop/yarn/log</value> <description>Use the list of directories from $YARN_LOCAL_LOG_DIR. For example, /grid/hadoop/yarn/log,/grid1/hadoop/yarn/log,/ grid2/hadoop/yarn/log </description> </property> <property> <name>yarn.log.server.url</name> <value>http://$jobhistoryserver.full.hostname:19888/jobhistory/logs/</ value> <description>URL for job history server</description> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>$resourcemanager.full.hostname:8088</value> <description>URL for job history server</description> </property> <property> <name>yarn.nodemanager.admin-env</name> <value>MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX</value> <description>Restrict the number of memory arenas to prevent excessive VMEM use by the glib arena allocator. For example, MALLOC_ARENA_MAX=4</description> </property>
Add the following properties to the yarn-site.xml file:
<property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity. CapacityScheduler</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>$resourcemanager.full.hostname:8025</value> <description>Enter your ResourceManager hostname.</description> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>$resourcemanager.full.hostname:8030</value> <description>Enter your ResourceManager hostname.</description> </property> <property> <name>yarn.resourcemanager.address</name> <value>$resourcemanager.full.hostname:8050</value> <description>Enter your ResourceManager hostname.</description> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>$resourcemanager.full.hostname:8141</value> <description>Enter your ResourceManager hostname.</description> </property> <property> <name>yarn.nodemanager.local-dirs</name> <value>/grid/hadoop/yarn/local,/grid1/hadoop/yarn/local</value> <description>Comma separated list of paths. Use the list of directories from $YARN_LOCAL_DIR. For example, /grid/hadoop/yarn/local,/grid1/hadoop/yarn/local. </description> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>/grid/hadoop/yarn/log</value> <description>Use the list of directories from $YARN_LOCAL_LOG_DIR. For example, /grid/hadoop/yarn/log, /grid1/hadoop/yarn/log,/grid2/hadoop/yarn/log </description> </property> <property> <name>yarn.log.server.url</name> <value>http://$jobhistoryserver.full.hostname:19888/jobhistory/logs/ </value> <description>URL for job history server</description> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>$resourcemanager.full.hostname:8088</value> <description>URL for job history server</description> </property> <property> <name>yarn.nodemanager.admin-env</name> <value>MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX</value> <description>Restrict the number of memory arenas to prevent excessive VMEM use by the glib arena allocator. For example, MALLOC_ARENA_MAX=4</description> </property>
Adding the following properties to the mapred-site.xml file:
<property> <name>mapreduce.jobhistory.address</name> <value>$jobhistoryserver.full.hostname:10020</value> <description>Enter your JobHistoryServer hostname.</description> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>$jobhistoryserver.full.hostname:19888</value> <description>Enter your JobHistoryServer hostname.</description> </property> <property> <name>mapreduce.shuffle.port</name> <value>13562</value> </property> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
For a secure cluster, add the following properties to mapred-site.xml:
<property> <name>mapreduce.jobhistory.principal</name> <value>jhs/_PRINCIPAL@$REALM.ACME.COM</value> <description>Kerberos principal name for the MapReduce JobHistory Server. </description> </property> </property> <name>mapreduce.jobhistory.keytab</name> <value>/etc/security/keytabs/jhs.service.keytab</value> <description>Kerberos keytab file for the MapReduce JobHistory Server.</description> </property>
For a secure cluster, you must also update hadoop.security.auth_to_local in core- site.xml to include a rule regarding the mapreduce.jobhistory.principal value you set in the previous step:
RULE:[2:$1@$0](PRINCIPAL@$REALM.ACME.COM )s/.*/mapred/
where PRINCIPAL and REALM are the kerberos principal and realm you specified in mapreduce.jobhistory.principal.
Delete any remaining HDP1 properties in the mapred-site.xml file.
Replace the default memory configuration settings in yarn-site.xml and mapred-site.xml with the YARN and MapReduce memory configuration settings you calculated previously.

