DistCp Between HA Clusters
To copy data between HA clusters, use the dfs.internal.nameservices property
in the hdfs-site.xml file to explicitly specify the name services belonging to
the local cluster, while continuing to use the dfs.nameservices property to
specify all of the name services in the local and remote clusters.
Use the following steps to copy data between HA clusters:
Create a new directory and copy the contents of the
/etc/hadoop/confdirectory on the local cluster to this directory. The local cluster is the cluster where you plan to run the distcp command.The following steps use
distcpConfas the directory name. Substitute the name of the directory you created fordistcpConf.In the
hdfs-site.xmlfile in thedistcpConfdirectory, add the nameservice ID for the remote cluster to thedfs.nameservicesproperty.![[Note]](../common/images/admon/note.png)
Note localnsis the nameservice ID of the local cluster andexternalnsis the namespace ID of the remote cluster.<property> <name>dfs.nameservices</name> <value>localns, externalns </value> </property> <property> <name>dfs.internal.nameservices</name> <value>localns</value> </property>On the remote cluster, find the
hdfs-site.xmlfile and copy the properties that refer to the nameservice ID to the end of thehdfs-site.xmlfile in thedistcpConfdirectory you created in step 1:dfs.ha.namenodes.<nameserviceID>dfs.namenode.rpc-address.<nameserviceID>.<namenode1>dfs.namenode.servicerpc-address.<nameserviceID>.<namenode1>dfs.namenode.http-address.<nameserviceID>.<namenode1>dfs.namenode.https-address.<nameserviceID>.<namenode1>dfs.namenode.rpc-address.<nameserviceID>.<namenode2>dfs.namenode.servicerpc-address.<nameserviceID>.<namenode2>dfs.namenode.http-address.<nameserviceID>.<namenode2>dfs.namenode.https-address.<nameserviceID>.<namenode2>Enter the following command to copy data from the remote cluster to the local cluster:
hadoop --config distcpConf distcp hdfs://externalns/<source_directory> hdfs://localns/<destination_directory>If you want to perform disctcp on a secure cluster, you must also pass the
mapreduce.job.send-token-confproperty along with distcp command, as follows:Hadoop –config distcpConf -Dmapreduce.job.send-token-conf="yarn.http.policy|^yarn.timeline-service.webapp. *$|^yarn.timeline-service.client.*$|hadoop.security.key.provider.path|hadoop.rpc.protection|dfs.nameservices| ^dfs.namenode.rpc-address.*$|^dfs.ha.namenodes.*$|^dfs.client.failover.proxy.provider. *$|dfs.namenode.kerberos.principal|dfs.namenode.kerberos.principal.pattern|mapreduce.jobhistory.principal" hdfs://externalns/<source_directory> hdfs://localns/<destination_directory>

