Using DistCp with S3
When using DistCp with data in S3, consider the following limitations:
The
-appendoption is not supported.The
-diffoption is not supported.The
-atomicoption causes a rename of the temporary data, so significantly increases the time to commit work at the end of the operation. Furthermore, as S3A does not offer atomic renames of directories, the-atomicoperation doesn't actually deliver what is promised. Avoid using this option.All
-poptions, including those to preserve permissions, user and group information, attributes checksums, and replication are ignored.CRC checking between HDFS and S3 will not be performed. We do still recommend using the
-skipcrccheckoption to make clear that this is taking place, and so that if etag checksums are enabled on S3A through the propertyfs.s3a.etag.checksum.enabled, then DistCp between HDFS and S3 will not not trigger checksum-mismatch errors.

