Using Custom Libraries with Spark
Spark comes equipped with a selection of libraries, including Spark SQL, Spark Streaming, and MLlib.
If you want to use a custom library, such as a compression library or Magellan, you can use one of the following two spark-submit script
options:
The
--jarsoption, which transfers associated .jar files to the cluster. Specify a list of comma-separated .jar files.The
--packagesoption, which pulls files directly from Spark packages. This approach requires an internet connection.
For example, you can use the --jars option to add codec files. The following
example adds the LZO compression
library:
spark-submit --driver-memory 1G \
--executor-memory 1G \
--master yarn-client \
--jars /usr/hdp/2.6.0.3-8/hadoop/lib/hadoop-lzo-0.6.0.2.6.0.3-8.jar \
test_read_write.pyFor more information about the two options, see Advanced Dependency Management on the Apache Spark "Submitting Applications" web page.
![]() | Note |
|---|---|
If you launch a Spark job that references a codec library without specifying where the codec resides, Spark returns an error similar to the following: Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found. To address this issue, specify the codec file with the |


![[Note]](../common/images/admon/note.png)