Configuring Spark
To configure Spark, edit the following configuration files on all nodes that will run
            Spark jobs. These configuration files reside in the Spark client conf directory
            /usr/hdp/current/spark-client/conf on each node.
- java-opts
- If you plan to use Hive with Spark, - hive-site.xml
- spark-env.sh
- spark-defaults.conf
| ![[Note]](../common/images/admon/note.png) | Note | 
|---|---|
| Note: the following instructions are for a non-Kerberized cluster. | 
java-opts
Create a java-opts file in the Spark client
                /conf directory. Add the following line to the file.
-Dhdp.version=<HDP-version>
For example:
-Dhdp.version=2.3.4.0-3485
hive-site.xml
If you plan to use Hive with Spark, create a hive-site.xml file in the
            Spark client SPARK_HOME/conf directory. (Note: if you installed the Spark
            tech preview you can skip this step.)
Edit the
            file so that it contains only the hive.metastore.uris property. Make sure
            that the hostname points to the URI where the Hive Metastore is running.
| ![[Important]](../common/images/admon/important.png) | Important | 
|---|---|
| 
 | 
For example:
<property>
     <name>hive.metastore.uris</name>
     <value>thrift://c6401.ambari.apache.org:9083</value>
     <description>URI for client to contact metastore server</description>
</property>spark-env.sh
Create a spark-env.sh file in the Spark client /conf directory, and make
            sure the file has the following entries:
# Location where log files are stored (default: ${SPARK_HOME}/logs)
# This can be any directory where the spark user has R/W access
export SPARK_LOG_DIR=/var/log/spark
# Location of the pid file (default: /tmp)
# This can be any directory where the spark user has R/W access
export SPARK_PID_DIR=/var/run/sparkThese settings are required for starting Spark services (for example, the History
            Service and the Thrift server). The user who starts Spark services needs to have read
            and write permissions to the log file and PID directory. By default these files are in
            the $SPARK_HOME directory, typically owned by root in RMP installation.
We recommend that you set HADOOP_CONF_DIR to the appropriate directory;
            for example:
set HADOOP_CONF_DIR=/etc/hadoop/conf
This will minimize the amount of work you will need to do to set up environment variables before running Spark applications.
spark-defaults.conf
Edit the spark-defaults.conf file in the Spark client
                /conf directory. Make sure the following values are specified,
            including hostname and port. For example:
spark.yarn.historyServer.address c6401.ambari.apache.org:18080 spark.history.ui.port 18080 spark.yarn.services org.apache.spark.deploy.yarn.history.YarnHistoryService spark.driver.extraJavaOptions -Dhdp.version=2.3.4.0-3371 spark.yarn.am.extraJavaOptions -Dhdp.version=2.3.4.0-3371
Create a Spark user
To use the Spark History Service, run Hive queries as the spark user, or
            run Spark jobs; the associated user must have sufficient HDFS access. One way of
            ensuring this is to add the user to the hdfs group.
The following example creates a spark user:
- Create the - sparkuser on all nodes. Add it to the- hdfsgroup.- useradd sparkThis command is only required for tarball spark installs, not rpm-based installs.- usermod -a -G hdfs spark
- Create the - sparkuser directory under- /user/spark:- sudo su $HDFS_USER- hdfs dfs -mkdir -p /user/spark- hdfs dfs -chown spark:spark /user/spark- hdfs dfs -chmod -R 755 /user/spark

