Specifying Which Version of Spark to Use
You can install more than one version of Spark on a node. Here are the guidelines for determining which version runs your job:
By default, if only one version of Spark is installed on a node, your job runs with the installed version.
By default, if more than one version of Spark is installed on a node, your job runs with the default version for your HDP package.
The default version for HDP 2.5.5 is Spark 1.6.2.
If more than one version of Spark is installed on a node, you can select which version of Spark runs your job.
To do this, set the
SPARK_MAJOR_VERSIONenvironment variable to the desired version before you launch the job.For example, if Spark 1.6.2 and the Spark 2.0 technical preview are both installed on a node, and you want to run your job with Spark 2.0, set
SPARK_MAJOR_VERSIONto2.0.
The SPARK_MAJOR_VERSION environment variable can be set by any
user who logs on to a client machine to run Spark. The scope of the environment variable is
local to the user session.
Here is an example for a user who submits jobs using spark-submit under
/usr/bin:
Navigate to a host where Spark 2.0 is installed.
Change to the Spark2 client directory:
cd /usr/hdp/current/spark2-client/Set the
SPARK_MAJOR_VERSIONenvironment variable to 2:export SPARK_MAJOR_VERSION=2Run the Spark Pi example:
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 1 --driver-memory 512m --executor-memory 512m --executor-cores 1 examples/jars/spark-examples*.jar 10Note that the path to
spark-examples-*.jaris different than the path used for Spark 1.x.
To change the environment variable setting later, either remove the environment variable or change the setting to the newly desired version.

