Configure GPU Scheduling and Isolation
On an Ambari cluster, you can configure GPU scheduling and isolation. On a non-Ambari
cluster, you must configure certain properties in the
capacity-scheduler.xml, resource-types.xml, and
yarn-site.xml files. Currently only Nvidia GPUs are supported in
YARN.
- YARN NodeManager must be installed with the Nvidia drivers.
Enable GPU scheduling and isolation on an Ambari cluster
- Select YARN > CONFIGS on the Ambari dashboard.
- Click GPU Scheduling and Isolation under GPU.
- In the Absolute path of nvidia-smi on NodeManagers field, enter the absolute path to the nvidia-smi GPU discovery executable. For example, /usr/local/bin/nvidia-smi
- Click Save, and then restart all the cluster components that require a restart.
INFO gpu.GpuDiscoverer (GpuDiscoverer.java:initialize(240)) - Trying to discover GPU information ...
WARN gpu.GpuDiscoverer (GpuDiscoverer.java:initialize(247)) - Failed to discover GPU information from system,
exception message:ExitCodeException exitCode=12: continue... Export the LD_LIBRARY_PATH in the yarn -env.sh using the following command:
export
LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64:$LD_LIBRARY_PATH
Enable GPU scheduling and isolation on a non-Ambari cluster
DominantResourceCalculator must be configured first before you enable
GPU scheduling/isolation. Configure the following property in
the/etc/hadoop/conf/capacity-scheduler.xml file
Property:
yarn.scheduler.capacity.resource-calculator
Value:
org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
- Enable GPU scheduling in the
/etc/hadoop/conf/resource-types.xmlfile on the ResourceManager and NodeManager hosts:Property:
yarn.resource-typesValue:
yarn.io/gpuExample:
<configuration> <property> <name>yarn.resource-types</name> <value>yarn.io/gpu</value> </property> </configuration> - Enable GPU isolation in the the
/etc/hadoop/conf/yarn-site.xmlfile on the NodeManager host:Property:
yarn.nodemanager.resource-pluginsValue:
yarn.io/gpuExample:
<configuration> <property> <name>yarn.nodemanager.resource-plugins</name> <value>yarn.io/gpu</value> </property> </configuration> - Set the following advanced properties in the
/etc/hadoop/conf/yarn-site.xmlfile on the NodeManager host:-
To allow GPU devices:
Property:
yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devicesValue:
auto
NoteTheautosetting enables YARN to automatically detect and manage GPU devices. For other options, see YARN-7223. -
To allow YARN NodeManager to to locate discovery executable:
Property:
yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executablesValue:<absolute_path_to_nvidia-smi_binary>
NoteSupports only nvidia-smi.Example:
/usr/local/bin/nvidia-smi
-
- Set the following property in the
/etc/hadoop/conf/yarn-site.xmlfile on the NodeManager host to automatically mount cgroup sub-devices:-
Property:
yarn.nodemanager.linux-container-executor.cgroups.mountValue:
true
-
- Set the following configuration in the
/etc/hadoop/conf/container-executor.cfgto run GPU applications under non-Docker environment:- In the GPU section, set:
Property:
module.enabled=true - In the cgroups section, set:
Property:
root=/sys/fs/cgroup
NoteThis should be same asyarn.nodemanager.linux-container-executor.cgroups.mount-pathin theyarn-site.xmlfileProperty:yarn-hierarchy=yarn
NoteThis should be same asyarn.nodemanager.linux-container-executor.cgroups.hierarchyin theyarn-site.xmlfile
- In the GPU section, set:

