Chapter 7. Configuring HDFS Compression
This section describes how to configure HDFS compression on Linux.
Linux supports GzipCodec,
                DefaultCodec, BZip2Codec,
                LzoCodec, and SnappyCodec. Typically,
                GzipCodec is used for HDFS compression. Use the following
            instructions to use GZipCodec.
- Option I: To use - GzipCodecwith a one-time only job:- hadoop jar hadoop-examples-1.1.0-SNAPSHOT.jar sort sbr"-Dmapred.compress.map.output=true" sbr"-Dmapred.map.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec"sbr "-Dmapred.output.compress=true" sbr"-Dmapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec"sbr -outKey org.apache.hadoop.io.Textsbr -outValue org.apache.hadoop.io.Text input output 
- Option II: To enable - GzipCodecas the default compression:- Edit the - core-site.xmlfile on the NameNode host machine:- <property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec, org.apache.hadoop.io.compress.SnappyCodec</value> <description>A list of the compression codec classes that can be used for compression/decompression.</description> </property>
- Edit the - mapred-site.xmlfile on the JobTracker host machine:- <property> <name>mapreduce.map.output.compress</name> <value>true</value> </property> <property> <name>mapreduce.map.output.compress.codec</name> <value>org.apache.hadoop.io.compress.GzipCodec</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress.type</name> <value>BLOCK</value> </property> 
- (Optional) - Enable the following two configuration parameters to enable job output compression. Edit the - mapred-site.xmlfile on the Resource Manager host machine:- <property> <name>mapreduce.output.fileoutputformat.compress</name> <value>true</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress.codec</name> <value>org.apache.hadoop.io.compress.GzipCodec</value> </property> 
- Restart the cluster using the applicable commands in Controlling HDP Services Manually. 
 

