DistCp Between HA Clusters
To copy data between HA clusters, use the dfs.internal.nameservices property
in the hdfs-site.xml file to explicitly specify the name services belonging to
the local cluster, while continuing to use the dfs.nameservices property to
specify all of the name services in the local and remote clusters.
Use the following steps to copy data between HA clusters:
Modify the following properties in the
hdfs-site.xmlfile in the HDFS client:Add both name services to
dfs.nameservices= HAA, HABAdd the
dfs.internal.nameservicesproperties:On the HAA cluster, add the following details:
dfs.internal.nameservices = HAAOn the HAB cluster, add the following details of the local cluster:
dfs.internal.nameservices = HAB
Add
dfs.ha.namenodes.<nameservice>details:dfs.ha.namenodes.HAB = nn1,nn2Add the
dfs.namenode.rpc-address.<cluster>.<nn>property:dfs.namenode.rpc-address.HAB.nn1 = <NN1_fqdn>:8020dfs.namenode.rpc-address.HAB.nn2 = <NN2_fqdn>:8020Add the following properties to enable
distcpover WebHDFS and secure WebHDFS:dfs.namenode.http-address.HAB.nn1 = <NN1_fqdn>:50070dfs.namenode.http-address.HAB.nn2 = <NN2_fqdn>:50070dfs.namenode.https-address.HAB.nn1 = <NN1_fqdn>:50470dfs.namenode.https-address.HAB.nn2 = <NN2_fqdn>:50470Add the
dfs.client.failover.proxy.provider.<cluster>property:dfs.client.failover.proxy.provider. HAB = org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
![[Note]](../common/images/admon/note.png)
Note The properties listed earlier can be used to copy data from HAA cluster to HAB cluster only. To be able to copy data from HAB to HAA, add the properties in the HDFS client on the HAA cluster as well.
Add the following property in the
mapred-site.xmlfile in the HDFS client on the local cluster:<property> <name>mapreduce.job.send-token-conf</name> <value> yarn.http.policy|^yarn.timeline-service.webapp.*$|^yarn.timeline-service.client.*$|hadoop.security.key.provider.path| hadoop.rpc.protection|dfs.nameservices|^dfs.namenode.rpc-address.*$|^dfs.ha.namenodes.*$| ^dfs.client.failover.proxy.provider.*$|dfs.namenode.kerberos.principal|dfs.namenode.kerberos.principal.pattern| mapreduce.jobhistory.principal </value> </property>![[Note]](../common/images/admon/note.png)
Note The properties listed earlier can be used to copy data from HAA cluster to HAB cluster only. To be able to copy data from HAB to HAA, add the mapreduce.job.send-token-conf in the HDFS client on the HAA cluster as well.
Restart the HDFS service, then run the
distcpcommand using the NameService. For example:hadoop distcp hdfs://HAA/tmp/testDistcp hdfs://HAB/tmp/

