Chapter 4. Configuring HDFS Compression
This section describes how to configure HDFS compression on Linux.
Linux supports GzipCodec,
DefaultCodec, BZip2Codec,
LzoCodec, and SnappyCodec. Typically,
GzipCodec is used for HDFS compression. Use the following
instructions to use GZipCodec.
Option I: To use
GzipCodecwith a one-time only job:hadoop jar hadoop-examples-1.1.0-SNAPSHOT.jar sort sbr"-Dmapred.compress.map.output=true" sbr"-Dmapred.map.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec"sbr "-Dmapred.output.compress=true" sbr"-Dmapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec"sbr -outKey org.apache.hadoop.io.Textsbr -outValue org.apache.hadoop.io.Text input output
Option II: To enable
GzipCodecas the default compression:Edit the
core-site.xmlfile on the NameNode host machine:<property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec, org.apache.hadoop.io.compress.SnappyCodec</value> <description>A list of the compression codec classes that can be used for compression/decompression.</description> </property>Edit the
mapred-site.xmlfile on the JobTracker host machine:<property> <name>mapreduce.map.output.compress</name> <value>true</value> </property> <property> <name>mapreduce.map.output.compress.codec</name> <value>org.apache.hadoop.io.compress.GzipCodec</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress.type</name> <value>BLOCK</value> </property>
(Optional) - Enable the following two configuration parameters to enable job output compression. Edit the
mapred-site.xmlfile on the Resource Manager host machine:<property> <name>mapreduce.output.fileoutputformat.compress</name> <value>true</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress.codec</name> <value>org.apache.hadoop.io.compress.GzipCodec</value> </property>
Restart the cluster using the applicable commands in Controlling HDP Services Manually.

