CUDA Information
- cuda_info.bsh
Determine easy to use capabilities for CUDA devices
There are many versions of CUDA, CUDA card architectures, etc… Knowing how to compile for a specific card is hard enough, but it’s very difficult to know what the right architectures are for your specific card and what the limitations are based on your version of CUDA/NVIDIA driver, etc. This script should help determine what versions you have and suggest what architectures to use. This is good enough for an automated solution to get up and running, but these suggestions are not absolute. You may find a fine-tuned configuration that works better on a case-by-case basis.
A number of variables are set by the following functions.
- CUDA_VERSION
The version of CUDA being used. These functions will attempt to discover the CUDA Toolkit in commonly known locations and gather a list of all discovered CUDAs in the sorted array CUDA_VERSIONS
. Then, the highest capable version of CUDA is picked and set to CUDA_VERSION
.
CUDA_VERSION
can optionally be set to a specific version (e.g. “7.5.13”), in which case other CUDA versions will not be discovered and CUDA_VERSIONS
will not be populated.
Note
Currently, CUDA Toolkits are discovered by checking the system PATH
and /usr/local/cuda*/bin/
directories for the nvcc executable. More paths should be added to this file as they become necessary.
- CUDA_VERSIONS
List of all CUDA_VERSION
-s discovered
- CUDA_ARCHES
An array of CUDA “virtual” instruction sets supported by CUDA. Every version of CUDA (nvcc
) has a set of “virtual” compute_xx
architectures (ISAs) that it can build against when compiling code for “real” sm_xx
architectures.
This array contains the list of the compute (virtual) architectures supported by the CUDA_VERSION
version of CUDA as an array of two digit numbers.
Example
$ echo "${CUDA_ARCHES[@]}"
20 30 32 35 37 50 52 53 60 61 62
Adding the periods to the architecture version number:
y=()
for x in ${CUDA_ARCHES[@]+"${CUDA_ARCHES[@]}"}; do
y+=("${x:0:${#x}-1}.${x:${#x}-1:1}")
done
$ echo "${y[@]}"
2.0 3.0 3.2 3.5 3.7 5.0 5.2 5.3 6.0 6.1 6.2
See also
- CUDA_CODES
An array of CUDA “real” instruction sets supported by CUDA. Every version of CUDA (nvcc
) has a set of “real” sm_xx
architectures that that it can assemble native (CUDA binary) code for.
This array contains a list of the sm architectures supported by the CUDA_VERSION
version of CUDA as an array of two digit numbers.
See also
- CUDA_LTOS
Starting in CUDA 11, NVIDIA introduced Link Time Optimizations (lto_
) architectures in addition to the already existing sm_
and compute_
Separately compiled kernels may not have as high of performance as if they were compiled together with the rest of the executable code because of the inability to inline code across files. Link-time optimization is a way to recover that improved performance
- CUDA_MINIMAL_DRIVER_VERSION
Every version of CUDA has a minimal version of the NVIDIA graphics-card driver that must be installed in order to support that version of CUDA. This was largely undocumented until CUDA 10 came out, despite being obviously important. This variable is set to the minimum required version of the NVIDIA driver for the CUDA_VERSION
version of CUDA, as best as we’ve been able to determine.
- CUDA_COMPATIBLE_DRIVER_VERSION
Starting in CUDA 11, NVIDIA introduced a cuda-compat` package that can be installed on machines with older NVIDIA Drivers for increased compatibility with these drivers. This is the minimum supported driver version that will handle the ``cuda-compat
runtime.
- CUDA_DEPRECATED
Some versions of CUDA support old instruction sets, but print a deprecated warning. For those versions of CUDA, a CUDA_DEPRECATED
array is defined to list the two digit architectures that are supported yet deprecated in the CUDA_VERSION
version of CUDA.
- CUDA_FORWARD_PTX
PTX arch for newer CUDA cards. In situations where you are making portable fatbinaries, you should compile for every architecture. However, in order to future proof your fatbin for architectures newer than your current version of CUDA supports, you will need to compile to a pure virtual architecture using the PTX feature so that the real architecture can be JIT (Just-In-Time) compiled.
CUDA_FORWARD_PTX
identifies the fullest-featured PTX architecture so that you can choose to add this to your builds.
- CUDA_CARDS
The names of the CUDA cards. Exact values may vary, e.g. “Titan X (Pascal)” vs “Titan Xp”.
- CUDA_CARD_FAMILIES
The family name of each card in
CUDA_CARDS
.
- CUDA_CARD_ARCHES
The specific CUDA architecture a card natively supports, for each card in CUDA_CARDS
.
- CUDA_SUGGESTED_ARCHES
Suggested “virtual” architectures to compile for. Instead of compiling for every architecture that the CUDA_VERSION
version of CUDA supports, CUDA_SUGGESTED_ARCHES
is the intersection between CUDA_CARD_ARCHES
and CUDA_ARCHES
so that you compile only for your cards.
- CUDA_SUGGESTED_CODES
Suggested “real” architectures to compile for. Instead of compiling for every architecture that the CUDA_VERSION
version of CUDA supports, CUDA_SUGGESTED_CODES
is the intersection between CUDA_CARD_ARCHES
and CUDA_CODES
so that you compile only for your cards.
- CUDA_SUGGESTED_PTX
Suggested PTX architectures to compile for. If your graphics card is too new for the CUDA_VERSION
version of CUDA, you will need to compile to a pure virtual architecture (by embedding PTX code in the fatbinary) in order to use it. That way, the real architecture can be JIT (Just-In-Time) compiled for at runtime.
CUDA_SUGGESTED_PTX
identifies the PTX architectures you need to run on newer (unrecognized) cards. You can choose to add them to your builds.
- discover_cuda_versions
Find CUDA development kits
Note
Will not work on macOS if it has NVIDIA and two or more versions of CUDA installed.
- nvidia_smi_cuda_version
Starting at NVIDIA Driver version 410.72, nvidia-smi
started listing the maximum version of CUDA that driver supports. This function can be used to parse that.
- Parameters:
NVIDIA_SMI
- Optional path to a specificnvidia-smi
. Default:nvidia-smi
(using the system path)
- Output:
stdout - echoes out the version of CUDA that
nvidia-smi
says it supports. If this version ofnvidia-smi
does not say, nothing is output.
- device_query_cuda_capability
- Output:
- Parameters:
DEVICE_QUERY
- Optional path to a specificdeviceQuery
. Default:deviceQuery
(using the system path)
- Return Value:
0
- No errorsNon-zero - If the output of
deviceQuery
does not contain any CUDA capabilities, then something likely failed. This is usually caused by running a contianer without the NVIDIA extensions or having an insufficient NVIDIA driver for the version of CUDAdeviceQuery
is compiled for.
Device query is one of the sample programs in the CUDA Samples that prints out useful information about the connected CUDA devices.
The deviceQuery
executable is compiled from the source code typically found the in /usr/local/cuda/samples/1_Utilities/deviceQuery/
, but can be downloaded precompiled for Linux from https://goo.gl/equvX3
- nvidia_smi_cuda_capability
Starting at NVIDIA Driver version >500, nvidia-smi
provides the compute_cap
query property. This function can be used to parse that.
- Output:
- Parameters:
NVIDIA_SMI
- Optional path to a specificnvidia-smi
. Default:nvidia-smi
(using the system path)
- Return Value:
0
- No errorsNon-zero - If the output of
nvidia-smi
does not contain any CUDA cards, then something likely failed.
- wmic_cuda_capability
- Output:
- Parameters:
WMIC
- Optional path to a specificwmic
. Default:wmic.exe
(using the system path)
- Return Value:
0
- No errorsNon-zero - If the output of
wmic.exe
does not contain any CUDA cards, then something likely failed.
- nvidia_docker_plugin_cuda_capability
The deprecated nvidia-docker v1 API actually hosted GPU information, which was useful for determining the CUDA Arches. This is rarely used now.
- Parameters:
[
NV_HOST
] - Environment variable to optionally specify a custom NVIDIA host for the GPUs. Default:http://localhost:3476
- Output:
- cuda_arch_to_cuda_family
Determines CUDA Family names based off of CUDA Arch stored in CUDA_CARD_ARCHES
- Parameters:
- Output:
- discover_cuda_info
Get CUDA info about each card
- cuda_capabilities
- Parameters:
$1
- The CUDA version to check against. This is typically thenvcc
version, but the docs version should work too. In a pinch, CUDA version will work, to some extent.- Output:
cuda_info.bsh CUDA_LTOS
- Starting in CUDA 11, Link-Time Optimization architectures were addedcuda_info.bsh CUDA_MINIMAL_DRIVER_VERSION
- Minimal required NVIDIA Driver needed to run this version of CUDAcuda_info.bsh CUDA_COMPATIBLE_DRIVER_VERSION
- Starting in CUDA 10, you can install acuda-compat
package that supports a different version of the CUDA library that runs on older drivers up to that version
Determine compiler capabilities for specific CDK
- suggested_architectures
- Output:
Calculate suggested architectures
- cmake_cuda_flags
- Parameters:
- Output:
stdout - echoes out the value of the target_CUDA_architectures
Generate CUDA flags for CMake
Modern CMake installs include a FindCUDA.cmake script which calls the select_compute_arch.cmake script (https://goo.gl/uZvAjR). It uses a limited version of the tables that cuda_info.bsh
uses and is prone to being out of date.
This function will calculate the suggested value of target_CUDA_architectures for CMake’s:
FindCUDA.cmake:select_compute_arch.cmake:CUDA_SELECT_NVCC_ARCH_FLAGS
You will need to find where this is used and set the variable accordingly.
Example
For example, PyTorch’s CMake contains:
CUDA_SELECT_NVCC_ARCH_FLAGS(NVCC_FLAGS_EXTRA $ENV{TORCH_CUDA_ARCH_LIST})
Setting the environment variable TORCH_CUDA_ARCH_LIST
to the output of cmake_cuda_flags
will result in using the desired CUDA architecture and code versions.
To add the cuda_info.bsh CUDA_FORWARD_PTX
, run:
CUDA_SUGGESTED_PTX+=(${CUDA_FORWARD_PTX})
before calling cmake_cuda_flags
- cmake_cuda_architectures
- Parameters:
- Output:
stdout - semi-colon delimited string of CUDA capabilities for cmake
Generate CUDA capabilities suitable for the CMAKE_CUDA_ARCHITECTURES
environment variable.
See cmake docs for more information https://cmake.org/cmake/help/latest/variable/CMAKE_CUDA_ARCHITECTURES.html
- torch_cuda_arch_list
- Parameters:
- Output:
stdout - comma delimited string of CUDA architectures for pytorch
Generate CUDA capabilities suitable for the TORCH_CUDA_ARCH_LIST
environment variable.
See pytorch docs for more information https://pytorch.org/docs/stable/cpp_extension.html#torch.utils.cpp_extension.CUDAExtension
- torch_cuda_arch_list_from_cmake
- Arguments:
$1
- delimited string of cuda architectures
- Output:
stdout - string containing
TORCH_CUDA_ARCH_LIST
statement
Create TORCH_CUDA_ARCH_LIST
from CMAKE_CUDA_ARCHITECTURES
statement.
Example
For example, an input of "80-real 86-virtual"
would produce the output "8.0 8.6+PTX"
- tcnn_cuda_architectures
- Parameters:
- Output:
stdout - comma delimited string of CUDA architectures for tiny-cuda-nn
Generate CUDA architectures for tiny-cuda-nn, suitable for the TCNN_CUDA_ARCHITECTURES
environment variable. Note tiny-cuda-nn will always build both binaries and PTX intermediate code for each specified architecture.
- tcnn_cuda_architectures_from_cmake
- Arguments:
$1
- delimited string of cuda architectures
- Output:
stdout - string containing
TCNN_CUDA_ARCHITECTURES
statement
Create TCNN_CUDA_ARCHITECTURES
from CMAKE_CUDA_ARCHITECTURES
statement.
Example
For example, an input of "80-real 86-virtual"
would produce the output "80 86"
- nvcc_gencodes
- Arguments:
$1
- delimited string of cuda architectures
- Output:
stdout - string containing nvcc gencode statements
Create nvcc gencode statements from input array of cuda capabilities.
Example
For example, an input of "75 86+PTX"
would produce the output
-gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_86,code=compute_86
- discover_cuda_all
Helper function to call all the discovery code in one call.
There are two methods for CUDA device discovery (in order):
Using deviceQuery (available here https://goo.gl/ocBgPU)
looks for ${DEVICE_QUERY-deviceQuery} on the PATH
Using the nvidia-docker-plugin to get and parse GPU information
discovered using either
NV_HOST
or checking to see if nvidia-docker-plugin is running locally using pgrep or ps
When running in a docker, deviceQuery is the preferred method. NV_HOST
could be used, but that involves telling the docker the IP of the host, or using a shared network mode in order to use localhost (which is not recommended for production). Attempting to discover nvidia-docker-plugin will not work in a docker.
Note
If deviceQuery is not used, then an internal lookup table is used, but only supports Tesla, Fermi, Kepler, Maxwell, Pascal, Volta, Tesla, and Ampere. Additional family names need to be added as they are released.
- has_gpu_device
Tells you if the computer has any NVIDIA GPUs. This is checking if they physically exist, not if they are currently working or have working drivers. This is usually used to determine if you need to install drivers, not if you have GPUs ready to use.
- Return Value:
0
- The machine has an NVIDIA GPU.1
- No NVIDIA GPUs were found.
See also
- is_gpu_setup
Checks to see if you have any GPUs installed and working.
- Return Value:
0
- There is at least one NVIDIA GPU available to use.1
- No NVIDIA GPUs were available.
If there is a problem with the NVIDIA drivers resulting in the kernel module not loading, then this check will return false.
If you are in a docker and did not route the GPU in correctly, it will return false.
See also