DistCp Between HA Clusters
To copy data between HA clusters, use the dfs.internal.nameservices property
in the hdfs-site.xml file to explicitly specify the name services belonging to
the local cluster, while continuing to use the dfs.nameservices property to
specify all of the name services in the local and remote clusters.
Use the following steps to copy data between HA clusters:
Modify the following properties in the hdfs-site.xml file for both cluster A
and cluster B:
Add both name services to
dfs.nameservices= HAA, HABAdd the
dfs.internal.nameservicesproperty:In cluster A:
dfs.internal.nameservices = HAAIn cluster B:
dfs.internal.nameservices = HAB
Add
dfs.ha.namenodes.<nameservice>to both clusters:In cluster A
dfs.ha.namenodes.HAB = nn1,nn2In cluster B
dfs.ha.namenodes.HAA = nn1,nn2
Add the
dfs.namenode.rpc-address.<cluster>.<nn>property:In Cluster A:
dfs.namenode.rpc-address.HAB.nn1 = <NN1_fqdn>:8020dfs.namenode.rpc-address.HAB.nn2 = <NN2_fqdn>:8020In Cluster B:
dfs.namenode.rpc-address.HAA.nn1 = <NN1_fqdn>:8020dfs.namenode.rpc-address.HAA.nn2 = <NN2_fqdn>:8020
Add the following properties to enable
distcpover WebHDFS and secure WebHDFS:In Cluster A:
dfs.namenode.http-address.HAB.nn1 = <NN1_fqdn>:50070dfs.namenode.http-address.HAB.nn2 = <NN2_fqdn>:50070dfs.namenode.https-address.HAB.nn1 = <NN1_fqdn>:50470dfs.namenode.https-address.HAB.nn2 = <NN2_fqdn>:50470In Cluster B:
dfs.namenode.http-address.HAA.nn1 = <NN1_fqdn>:50070dfs.namenode.http-address.HAA.nn2 = <NN2_fqdn>:50070dfs.namenode.https-address.HAA.nn1 = <NN1_fqdn>:50470dfs.namenode.https-address.HAA.nn2 = <NN2_fqdn>:50470
Add the
dfs.client.failover.proxy.provider.<cluster>property:In cluster A:
dfs.client.failover.proxy.provider. HAB = org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProviderIn cluster B:
dfs.client.failover.proxy.provider. HAA = org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
Restart the HDFS service, then run the
distcpcommand using the NameService. For example:hadoop distcp hdfs://HAA/tmp/testDistcp hdfs://HAB/tmp/

