Specifying Which Version of Spark to Run
More than one version of Spark can run on a node. If your cluster runs Spark 1, you can install Spark 2 and test jobs on Spark 2 in parallel with a Spark 1 working environment. After verifying that all scripts and jobs run successfully with Spark 2 (including any changes for backward compatibility), you can then step through transitioning jobs from Spark 1 to Spark 2. For more information about installing a second version of Spark, see Installing Spark.
Use the following guidelines for determining which version of Spark runs a job by default, and for specifying an alternate version if desired.
By default, if only one version of Spark is installed on a node, your job runs with the installed version.
By default, if more than one version of Spark is installed on a node, your job runs with the default version for your HDP package. In HDP 2.6, the default is Spark version 1.6.
If you want to run jobs on the non-default version of Spark, use one of the following approaches:
If you use full paths in your scripts, change
spark-clienttospark2-client; for example:change
/usr/hdp/current/spark-client/bin/spark-submitto
/usr/hdp/current/spark2-client/bin/spark-submit.If you do not use full paths, but instead launch jobs from the path, set the
SPARK_MAJOR_VERSIONenvironment variable to the desired version of Spark before you launch the job.For example, if Spark 1.6.3 and Spark 2.0 are both installed on a node and you want to run your job with Spark 2.0, set
SPARK_MAJOR_VERSION=2.You can set
SPARK_MAJOR_VERSIONin automation scripts that use Spark, or in your manual settings after logging on to the shell.Note: The
SPARK_MAJOR_VERSIONenvironment variable can be set by any user who logs on to a client machine to run Spark. The scope of the environment variable is local to the user session.
The following example submits a SparkPi job to Spark 2, using spark-submit
under /usr/bin:
Navigate to a host where Spark 2.0 is installed.
Change to the
Spark2client directory:cd /usr/hdp/current/spark2-client/Set the
SPARK_MAJOR_VERSIONenvironment variable to 2:export SPARK_MAJOR_VERSION=2Run the Spark Pi example:
./bin/spark-submit --class org.apache.spark.examples.SparkPi \ --master yarn-client \ --num-executors 1 \ --driver-memory 512m \ --executor-memory 512m \ --executor-cores 1 \ examples/jars/spark-examples*.jar 10Note that the path to
spark-examples-*.jaris different than the path used for Spark 1.x.
To change the environment variable setting later, either remove the environment variable or change the setting to the newly desired version.

