CUDA Information

cuda_info.bsh

Determine easy to use capabilities for CUDA devices

There are many versions of CUDA, CUDA card architectures, etc… Knowing how to compile for a specific card is hard enough, but it’s very difficult to know what the right architectures are for your specific card and what the limitations are based on your version of CUDA/NVIDIA driver, etc. This script should help determine what versions you have and suggest what architectures to use. This is good enough for an automated solution to get up and running, but these suggestions are not absolute. You may find a fine-tuned configuration that works better on a case-by-case basis.

A number of variables are set by the following functions.

CUDA_VERSION

The version of CUDA being used. These functions will attempt to discover the CUDA Toolkit in commonly known locations and gather a list of all discovered CUDAs in the sorted array CUDA_VERSIONS. Then, the highest capable version of CUDA is picked and set to CUDA_VERSION.

CUDA_VERSION can optionally be set to a specific version (e.g. “7.5.13”), in which case other CUDA versions will not be discovered and CUDA_VERSIONS will not be populated.

Note

Currently, CUDA Toolkits are discovered by checking the system PATH and /usr/local/cuda*/bin/ directories for the nvcc executable. More paths should be added to this file as they become necessary.

CUDA_VERSIONS

List of all CUDA_VERSION-s discovered

CUDA_ARCHES

An array of CUDA “virtual” instruction sets supported by CUDA. Every version of CUDA (nvcc) has a set of “virtual” compute_xx architectures (ISAs) that it can build against when compiling code for “real” sm_xx architectures.

This array contains the list of the compute (virtual) architectures supported by the CUDA_VERSION version of CUDA as an array of two digit numbers.

Example

$ echo "${CUDA_ARCHES[@]}"
20 30 32 35 37 50 52 53 60 61 62

Adding the periods to the architecture version number:

y=()
for x in ${CUDA_ARCHES[@]+"${CUDA_ARCHES[@]}"}; do
  y+=("${x:0:${#x}-1}.${x:${#x}-1:1}")
done

$ echo "${y[@]}"
2.0 3.0 3.2 3.5 3.7 5.0 5.2 5.3 6.0 6.1 6.2

See also

CUDA_DEPRECATED

CUDA_CODES

An array of CUDA “real” instruction sets supported by CUDA. Every version of CUDA (nvcc) has a set of “real” sm_xx architectures that that it can assemble native (CUDA binary) code for.

This array contains a list of the sm architectures supported by the CUDA_VERSION version of CUDA as an array of two digit numbers.

See also

CUDA_DEPRECATED

CUDA_LTOS

Starting in CUDA 11, NVIDIA introduced Link Time Optimizations (lto_) architectures in addition to the already existing sm_ and compute_

Separately compiled kernels may not have as high of performance as if they were compiled together with the rest of the executable code because of the inability to inline code across files. Link-time optimization is a way to recover that improved performance

CUDA_MINIMAL_DRIVER_VERSION

Every version of CUDA has a minimal version of the NVIDIA graphics-card driver that must be installed in order to support that version of CUDA. This was largely undocumented until CUDA 10 came out, despite being obviously important. This variable is set to the minimum required version of the NVIDIA driver for the CUDA_VERSION version of CUDA, as best as we’ve been able to determine.

CUDA_COMPATIBLE_DRIVER_VERSION

Starting in CUDA 11, NVIDIA introduced a cuda-compat` package that can be installed on machines with older NVIDIA Drivers for increased compatibility with these drivers. This is the minimum supported driver version that will handle the ``cuda-compat runtime.

CUDA_DEPRECATED

Some versions of CUDA support old instruction sets, but print a deprecated warning. For those versions of CUDA, a CUDA_DEPRECATED array is defined to list the two digit architectures that are supported yet deprecated in the CUDA_VERSION version of CUDA.

CUDA_FORWARD_PTX

PTX arch for newer CUDA cards. In situations where you are making portable fatbinaries, you should compile for every architecture. However, in order to future proof your fatbin for architectures newer than your current version of CUDA supports, you will need to compile to a pure virtual architecture using the PTX feature so that the real architecture can be JIT (Just-In-Time) compiled.

CUDA_FORWARD_PTX identifies the fullest-featured PTX architecture so that you can choose to add this to your builds.

CUDA_CARDS

The names of the CUDA cards. Exact values may vary, e.g. “Titan X (Pascal)” vs “Titan Xp”.

CUDA_CARD_FAMILIES: The family name of each card in CUDA_CARDS.

CUDA_CARD_ARCHES

The specific CUDA architecture a card natively supports, for each card in CUDA_CARDS.

CUDA_SUGGESTED_ARCHES

Suggested “virtual” architectures to compile for. Instead of compiling for every architecture that the CUDA_VERSION version of CUDA supports, CUDA_SUGGESTED_ARCHES is the intersection between CUDA_CARD_ARCHES and CUDA_ARCHES so that you compile only for your cards.

CUDA_SUGGESTED_CODES

Suggested “real” architectures to compile for. Instead of compiling for every architecture that the CUDA_VERSION version of CUDA supports, CUDA_SUGGESTED_CODES is the intersection between CUDA_CARD_ARCHES and CUDA_CODES so that you compile only for your cards.

CUDA_SUGGESTED_PTX

Suggested PTX architectures to compile for. If your graphics card is too new for the CUDA_VERSION version of CUDA, you will need to compile to a pure virtual architecture (by embedding PTX code in the fatbinary) in order to use it. That way, the real architecture can be JIT (Just-In-Time) compiled for at runtime.

CUDA_SUGGESTED_PTX identifies the PTX architectures you need to run on newer (unrecognized) cards. You can choose to add them to your builds.

DEVICE_QUERY

Path to a specific deviceQuery. Default: deviceQuery using the system path.

discover_cuda_versions

Output:

cuda_info.bsh CUDA_VERSIONS
cuda_info.bsh CUDA_VERSION

Find CUDA development kits

Note

Will not work on macOS if it has NVIDIA and two or more versions of CUDA installed.

nvidia_smi_cuda_version

Starting at NVIDIA Driver version 410.72, nvidia-smi started listing the maximum version of CUDA that driver supports. This function can be used to parse that.

Parameters:

NVIDIA_SMI - Optional path to a specific nvidia-smi. Default: nvidia-smi (using the system path)

Output:

stdout - echoes out the version of CUDA that nvidia-smi says it supports. If this version of nvidia-smi does not say, nothing is output.

device_query_cuda_capability

Output:

cuda_info.bsh CUDA_CARD_FAMILIES
cuda_info.bsh CUDA_CARD_ARCHES
cuda_info.bsh CUDA_CARDS

Parameters:

DEVICE_QUERY - Optional path to a specific deviceQuery. Default: deviceQuery (using the system path)

Return Value:

0 - No errors
Non-zero - If the output of deviceQuery does not contain any CUDA capabilities, then something likely failed. This is usually caused by running a contianer without the NVIDIA extensions or having an insufficient NVIDIA driver for the version of CUDA deviceQuery is compiled for.

Device query is one of the sample programs in the CUDA Samples that prints out useful information about the connected CUDA devices.

The deviceQuery executable is compiled from the source code typically found the in /usr/local/cuda/samples/1_Utilities/deviceQuery/, but can be downloaded precompiled for Linux from https://goo.gl/equvX3

nvidia_smi_cuda_capability

Starting at NVIDIA Driver version >500, nvidia-smi provides the compute_cap query property. This function can be used to parse that.

Output:

cuda_info.bsh CUDA_CARD_FAMILIES
cuda_info.bsh CUDA_CARD_ARCHES
cuda_info.bsh CUDA_CARDS

Parameters:

NVIDIA_SMI - Optional path to a specific nvidia-smi. Default: nvidia-smi (using the system path)

Return Value:

0 - No errors
Non-zero - If the output of nvidia-smi does not contain any CUDA cards, then something likely failed.

wmic_cuda_capability

Output:

cuda_info.bsh CUDA_CARDS

Parameters:

WMIC - Optional path to a specific wmic. Default: wmic.exe (using the system path)

Return Value:

0 - No errors
Non-zero - If the output of wmic.exe does not contain any CUDA cards, then something likely failed.

nvidia_docker_plugin_cuda_capability

The deprecated nvidia-docker v1 API actually hosted GPU information, which was useful for determining the CUDA Arches. This is rarely used now.

Parameters:

[NV_HOST] - Environment variable to optionally specify a custom NVIDIA host for the GPUs. Default: http://localhost:3476

Output:

cuda_info.bsh CUDA_CARD_FAMILIES
cuda_info.bsh CUDA_CARD_ARCHES
cuda_info.bsh CUDA_CARDS

cuda_arch_to_cuda_family

Determines CUDA Family names based off of CUDA Arch stored in CUDA_CARD_ARCHES

Parameters:: cuda_info.bsh CUDA_CARD_ARCHES
Output:: cuda_info.bsh CUDA_CARD_FAMILIES

discover_cuda_info

Get CUDA info about each card

Output:

cuda_info.bsh CUDA_CARDS
cuda_info.bsh CUDA_CARD_ARCHES
cuda_info.bsh CUDA_CARD_FAMILIES

download_device_query

Download deviceQuery executable if not already present. Executable location defaults to "${${JUST_PROJECT_PREFIX}_CWD}/.bin/deviceQuery"

Output:

cuda_info.bsh DEVICE_QUERY

cuda_capabilities

Parameters:

$1 - The CUDA version to check against. This is typically the nvcc version, but the docs version should work too. In a pinch, CUDA version will work, to some extent.

Output:

cuda_info.bsh CUDA_ARCHES
cuda_info.bsh CUDA_CODES
cuda_info.bsh CUDA_LTOS - Starting in CUDA 11, Link-Time Optimization architectures were added
cuda_info.bsh CUDA_MINIMAL_DRIVER_VERSION - Minimal required NVIDIA Driver needed to run this version of CUDA
cuda_info.bsh CUDA_COMPATIBLE_DRIVER_VERSION - Starting in CUDA 10, you can install a cuda-compat package that supports a different version of the CUDA library that runs on older drivers up to that version
cuda_info.bsh CUDA_DEPRECATED

Determine compiler capabilities for specific CDK

suggested_architectures

Output:

cuda_info.bsh CUDA_SUGGESTED_ARCHES
cuda_info.bsh CUDA_SUGGESTED_CODES
cuda_info.bsh CUDA_SUGGESTED_PTX
cuda_info.bsh CUDA_FORWARD_PTX

Calculate suggested architectures

cmake_cuda_flags

Parameters:

cuda_info.bsh CUDA_SUGGESTED_ARCHES
cuda_info.bsh CUDA_SUGGESTED_CODES
[cuda_info.bsh CUDA_SUGGESTED_PTX]

Output:

stdout - echoes out the value of the target_CUDA_architectures

Generate CUDA flags for CMake

Modern CMake installs include a FindCUDA.cmake script which calls the select_compute_arch.cmake script (https://goo.gl/uZvAjR). It uses a limited version of the tables that cuda_info.bsh uses and is prone to being out of date.

This function will calculate the suggested value of target_CUDA_architectures for CMake’s:

FindCUDA.cmake:select_compute_arch.cmake:CUDA_SELECT_NVCC_ARCH_FLAGS

You will need to find where this is used and set the variable accordingly.

Example

For example, PyTorch’s CMake contains:

CUDA_SELECT_NVCC_ARCH_FLAGS(NVCC_FLAGS_EXTRA $ENV{TORCH_CUDA_ARCH_LIST})

Setting the environment variable TORCH_CUDA_ARCH_LIST to the output of cmake_cuda_flags will result in using the desired CUDA architecture and code versions.

To add the cuda_info.bsh CUDA_FORWARD_PTX, run:

CUDA_SUGGESTED_PTX+=(${CUDA_FORWARD_PTX})

before calling cmake_cuda_flags

cmake_cuda_architectures

Parameters:

cuda_info.bsh CUDA_SUGGESTED_ARCHES
[cuda_info.bsh CUDA_SUGGESTED_PTX]

Output:

stdout - semi-colon delimited string of CUDA capabilities for cmake

Generate CUDA capabilities suitable for the CMAKE_CUDA_ARCHITECTURES environment variable. See cmake docs for more information https://cmake.org/cmake/help/latest/variable/CMAKE_CUDA_ARCHITECTURES.html

torch_cuda_arch_list

Parameters:

cuda_info.bsh CUDA_SUGGESTED_ARCHES
[cuda_info.bsh CUDA_SUGGESTED_PTX]

Output:

stdout - comma delimited string of CUDA architectures for pytorch

Generate CUDA capabilities suitable for the TORCH_CUDA_ARCH_LIST environment variable. See pytorch docs for more information https://pytorch.org/docs/stable/cpp_extension.html#torch.utils.cpp_extension.CUDAExtension

torch_cuda_arch_list_from_cmake

Arguments:

$1 - delimited string of cuda architectures

Output:

stdout - string containing TORCH_CUDA_ARCH_LIST statement

Create TORCH_CUDA_ARCH_LIST from CMAKE_CUDA_ARCHITECTURES statement.

Example

For example, an input of "80-real 86-virtual" would produce the output "8.0 8.6+PTX"

tcnn_cuda_architectures

Parameters:

cuda_info.bsh CUDA_SUGGESTED_ARCHES
[cuda_info.bsh CUDA_SUGGESTED_PTX]

Output:

stdout - comma delimited string of CUDA architectures for tiny-cuda-nn

Generate CUDA architectures for tiny-cuda-nn, suitable for the TCNN_CUDA_ARCHITECTURES environment variable. Note tiny-cuda-nn will always build both binaries and PTX intermediate code for each specified architecture.

tcnn_cuda_architectures_from_cmake

Arguments:

$1 - delimited string of cuda architectures

Output:

stdout - string containing TCNN_CUDA_ARCHITECTURES statement

Create TCNN_CUDA_ARCHITECTURES from CMAKE_CUDA_ARCHITECTURES statement.

Example

For example, an input of "80-real 86-virtual" would produce the output "80 86"

nvcc_gencodes

Arguments:

$1 - delimited string of cuda architectures

Output:

stdout - string containing nvcc gencode statements

Create nvcc gencode statements from input array of cuda capabilities.

Example

For example, an input of "75 86+PTX" would produce the output

-gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_86,code=compute_86

discover_cuda_all

Helper function to call all the discovery code in one call.

There are two methods for CUDA device discovery (in order):

Using deviceQuery (available here https://goo.gl/ocBgPU)

looks for ${DEVICE_QUERY-deviceQuery} on the PATH

Using the nvidia-docker-plugin to get and parse GPU information

discovered using either NV_HOST or checking to see if nvidia-docker-plugin is running locally using pgrep or ps

When running in a docker, deviceQuery is the preferred method. NV_HOST could be used, but that involves telling the docker the IP of the host, or using a shared network mode in order to use localhost (which is not recommended for production). Attempting to discover nvidia-docker-plugin will not work in a docker.

Output:

cuda_info.bsh CUDA_VERSIONS
cuda_info.bsh CUDA_VERSION
cuda_info.bsh CUDA_CARDS
cuda_info.bsh CUDA_CARD_ARCHES
cuda_info.bsh CUDA_CARD_FAMILIES
cuda_info.bsh CUDA_ARCHES
cuda_info.bsh CUDA_CODES
cuda_info.bsh CUDA_LTOS
cuda_info.bsh CUDA_MINIMAL_DRIVER_VERSION
cuda_info.bsh CUDA_COMPATIBLE_DRIVER_VERSION
cuda_info.bsh CUDA_DEPRECATED
cuda_info.bsh CUDA_SUGGESTED_ARCHES
cuda_info.bsh CUDA_SUGGESTED_CODES
cuda_info.bsh CUDA_SUGGESTED_PTX
cuda_info.bsh CUDA_FORWARD_PTX

Note

If deviceQuery is not used, then an internal lookup table is used, but only supports Tesla, Fermi, Kepler, Maxwell, Pascal, Volta, Tesla, and Ampere. Additional family names need to be added as they are released.

has_gpu_device

Tells you if the computer has any NVIDIA GPUs. This is checking if they physically exist, not if they are currently working or have working drivers. This is usually used to determine if you need to install drivers, not if you have GPUs ready to use.

Return Value:

0 - The machine has an NVIDIA GPU.
1 - No NVIDIA GPUs were found.

See also

is_gpu_setup

is_gpu_setup

Checks to see if you have any GPUs installed and working.

Return Value:

0 - There is at least one NVIDIA GPU available to use.
1 - No NVIDIA GPUs were available.

If there is a problem with the NVIDIA drivers resulting in the kernel module not loading, then this check will return false.
If you are in a docker and did not route the GPU in correctly, it will return false.

See also

has_gpu_device