Configuring Cluster Dynamic Resource Allocation Manually
To configure a cluster to run Spark jobs with dynamic resource allocation, complete the following steps:
Add the following properties to the
spark-defaults.conffile associated with your Spark installation (typically in the$SPARK_HOME/confdirectory):Set
spark.dynamicAllocation.enabledtotrue.Set
spark.shuffle.service.enabledtotrue.
(Optional) To specify a starting point and range for the number of executors, use the following properties:
spark.dynamicAllocation.initialExecutorsspark.dynamicAllocation.minExecutorsspark.dynamicAllocation.maxExecutors
Note that
initialExecutorsmust be greater than or equal tominExecutors, and less than or equal tomaxExecutors.For a description of each property, see Dynamic Resource Allocation Properties.
Start the shuffle service on each worker node in the cluster:
In the
yarn-site.xmlfile on each node, addspark_shuffletoyarn.nodemanager.aux-services, and then setyarn.nodemanager.aux-services.spark_shuffle.classtoorg.apache.spark.network.yarn.YarnShuffleService.Review and, if necessary, edit
spark.shuffle.service.*configuration settings.For more information, see the Apache Spark Shuffle Behavior documentation.
Restart all NodeManagers in your cluster.

