Chapter 5. Installing Spark with Kerberos
Spark jobs are submitted to a Hadoop cluster as YARN jobs. The developer creates a Spark application in a local environment, and tests it in a single-node Spark Standalone cluster on their developer workstation.
When a job is ready to run in a production environment, there are a few additional steps if the cluster is Kerberized:
The Spark History Server daemon needs a Kerberos account and keytab to run in a Kerberized cluster.
When you enable Kerberos for a Hadoop cluster with Ambari, Ambari configures Kerberos for the Spark History Server and automatically creates a Kerberos account and keytab for it. For more information, see Configuring Ambari and Hadoop for Kerberos.
If you are not using Ambari, or if you plan to enable Kerberos manually for the Spark History Server, see Creating Service Principals and Keytab Files for HDP in the Manual Install Guide.
To submit Spark jobs in a Kerberized cluster, the account (or person) submitting jobs needs a Kerberos account & keytab.
When access is authenticated without human interaction -- as happens for processes that submit job requests -- the process would use a headless keytab. Security risk is mitigated by ensuring that only the service who should be using the headless keytab has the permissions to read it.
An end user should use their own keytab when submitting a Spark job.
Setting Up Principals and Keytabs for End User Access to Spark
In the following example, user $USERNAME runs the Spark Pi job in a
Kerberos-enabled environment:
su $USERNAME
kinit USERNAME@YOUR-LOCAL-REALM.COM
cd /usr/hdp/current/spark-client/
./bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn-cluster \
--num-executors 3 \
--driver-memory 512m \
--executor-memory 512m \
--executor-cores 1 \
lib/spark-examples*.jar 10Setting Up Service Principals and Keytabs for Processes Submitting Spark Jobs
The following example shows the creation and use of a headless keytab for a
spark service user account that will submit Spark jobs on node
blue1@example.com:
Create a Kerberos service principal for user
spark:kadmin.local -q "addprinc -randkey spark/blue1@EXAMPLE.COM"Create the keytab:
kadmin.local -q "xst -k /etc/security/keytabs/spark.keytab spark/blue1@EXAMPLE.COM"Create a
sparkuser and add it to thehadoopgroup. (Do this for every node of your cluster.)useradd spark -g hadoopMake
sparkthe owner of the newly-created keytab:chown spark:hadoop /etc/security/keytabs/spark.keytabLimit access: make sure user
sparkis the only user with access to the keytab:chmod 400 /etc/security/keytabs/spark.keytab
In the following steps, user spark runs the Spark Pi example in a
Kerberos-enabled environment:
su spark
kinit -kt /etc/security/keytabs/spark.keytab spark/blue1@EXAMPLE.COM
cd /usr/hdp/current/spark-client/
./bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn-cluster \
--num-executors 1 \
--driver-memory 512m \
--executor-memory 512m \
--executor-cores 1 \
lib/spark-examples*.jar 10Accessing the Hive Metastore in Secure Mode
Requirements for accessing the Hive Metastore in secure mode (with Kerberos):
The Spark thrift server must be co-located with the Hive thrift server.
The
sparkuser must be able to access the Hive keytab.In yarn-client mode on a secure cluster you can use HiveContext to access the Hive Metastore. (HiveContext is not supported for yarn-cluster mode on a secure cluster.)

