Configuring Cluster Dynamic Resource Allocation Manually
To configure a cluster to run Spark applications with dynamic resource allocation:
Add the following properties to the
spark-defaults.conffile associated with your Spark installation. (For general Spark applications, this file typically resides at$SPARK_HOME/conf/spark-defaults.conf.)Set
spark.dynamicAllocation.enabledtotrueSet
spark.shuffle.service.enabledtotrue
(Optional) The following properties specify a starting point and range for the number of executors. Note that
initialExecutorsmust be greater than or equal tominExecutors, and less than or equal tomaxExecutors.spark.dynamicAllocation.initialExecutorsspark.dynamicAllocation.minExecutorsspark.dynamicAllocation.maxExecutors
For a description of each property, see Dynamic Resource Allocation Properties.
Start the shuffle service on each worker node in the cluster. (The shuffle service runs as an auxiliary service of the NodeManager.)
In the
yarn-site.xmlfile on each node, addspark_shuffletoyarn.nodemanager.aux-services, then setyarn.nodemanager.aux-services.spark_shuffle.classtoorg.apache.spark.network.yarn.YarnShuffleService.Review and, if necessary, edit
spark.shuffle.service.*configuration settings. For more information, see the Apache Spark Shuffle Behavior documentation.Restart all NodeManagers in your cluster.

