CUDA Information

cuda_info.bsh

Determine easy to use capabilities for CUDA devices

There are many versions of CUDA, CUDA card architectures, etc… Knowing how to compile for a specific card is hard enough, but it’s very difficult to know what the right architectures are for your specific card and what the limitations are based on your version of CUDA/NVIDIA driver, etc. This script should help determine what versions you have and suggest what architectures to use. This is good enough for an automated solution to get up and running, but these suggestions are not absolute. You may find a fine-tuned configuration that works better on a case-by-case basis.

A number of variables are set by the following functions.

CUDA_VERSION

The version of CUDA being used. These functions will attempt to discover the CUDA Toolkit in commonly known locations and gather a list of all discovered CUDAs in the sorted array CUDA_VERSIONS. Then, the highest capable version of CUDA is picked and set to CUDA_VERSION.

CUDA_VERSION can optionally be set to a specific version (e.g. “7.5.13”), in which case other CUDA versions will not be discovered and CUDA_VERSIONS will not be populated.

Note

Currently, CUDA Toolkits are discovered by checking the system PATH and /usr/local/cuda*/bin/ directories for the nvcc executable. More paths should be added to this file as they become necessary.

CUDA_VERSIONS

List of all CUDA_VERSION-s discovered

CUDA_ARCHES

An array of CUDA “virtual” instruction sets supported by CUDA. Every version of CUDA (nvcc) has a set of “virtual” compute_xx architectures (ISAs) that it can build against when compiling code for “real” sm_xx architectures.

This array contains the list of the compute (virtual) architectures supported by the CUDA_VERSION version of CUDA as an array of two digit numbers.

Example

$ echo "${CUDA_ARCHES[@]}"
20 30 32 35 37 50 52 53 60 61 62

Adding the periods to the architecture version number:

y=()
for x in ${CUDA_ARCHES[@]+"${CUDA_ARCHES[@]}"}; do
  y+=("${x:0:${#x}-1}.${x:${#x}-1:1}")
done

$ echo "${y[@]}"
2.0 3.0 3.2 3.5 3.7 5.0 5.2 5.3 6.0 6.1 6.2

See also

CUDA_DEPRECATED

CUDA_CODES

An array of CUDA “real” instruction sets supported by CUDA. Every version of CUDA (nvcc) has a set of “real” sm_xx architectures that that it can assemble native (CUDA binary) code for.

This array contains a list of the sm architectures supported by the CUDA_VERSION version of CUDA as an array of two digit numbers.

See also

CUDA_DEPRECATED

CUDA_LTOS

Starting in CUDA 11, NVIDIA introduced Link Time Optimizations (lto_) architectures in addition to the already existing sm_ and compute_

Separately compiled kernels may not have as high of performance as if they were compiled together with the rest of the executable code because of the inability to inline code across files. Link-time optimization is a way to recover that improved performance

CUDA_MINIMAL_DRIVER_VERSION

Every version of CUDA has a minimal version of the NVIDIA graphics-card driver that must be installed in order to support that version of CUDA. This was largely undocumented until CUDA 10 came out, despite being obviously important. This variable is set to the minimum required version of the NVIDIA driver for the CUDA_VERSION version of CUDA, as best as we’ve been able to determine.

CUDA_COMPATIBLE_DRIVER_VERSION

Starting in CUDA 11, NVIDIA introduced a cuda-compat` package that can be installed on machines with older NVIDIA Drivers for increased compatibility with these drivers. This is the minimum supported driver version that will handle the ``cuda-compat runtime.

CUDA_DEPRECATED

Some versions of CUDA support old instruction sets, but print a deprecated warning. For those versions of CUDA, a CUDA_DEPRECATED array is defined to list the two digit architectures that are supported yet deprecated in the CUDA_VERSION version of CUDA.

CUDA_FORWARD_PTX

PTX arch for newer CUDA cards. In situations where you are making portable fatbinaries, you should compile for every architecture. However, in order to future proof your fatbin for architectures newer than your current version of CUDA supports, you will need to compile to a pure virtual architecture using the PTX feature so that the real architecture can be JIT (Just-In-Time) compiled.

CUDA_FORWARD_PTX identifies the fullest-featured PTX architecture so that you can choose to add this to your builds.

CUDA_CARDS

The names of the CUDA cards. Exact values may vary, e.g. “Titan X (Pascal)” vs “Titan Xp”.

CUDA_CARD_FAMILIES

The family name of each card in CUDA_CARDS.

CUDA_CARD_ARCHES

The specific CUDA architecture a card natively supports, for each card in CUDA_CARDS.

CUDA_SUGGESTED_ARCHES

Suggested “virtual” architectures to compile for. Instead of compiling for every architecture that the CUDA_VERSION version of CUDA supports, CUDA_SUGGESTED_ARCHES is the intersection between CUDA_CARD_ARCHES and CUDA_ARCHES so that you compile only for your cards.

CUDA_SUGGESTED_CODES

Suggested “real” architectures to compile for. Instead of compiling for every architecture that the CUDA_VERSION version of CUDA supports, CUDA_SUGGESTED_CODES is the intersection between CUDA_CARD_ARCHES and CUDA_CODES so that you compile only for your cards.

CUDA_SUGGESTED_PTX

Suggested PTX architectures to compile for. If your graphics card is too new for the CUDA_VERSION version of CUDA, you will need to compile to a pure virtual architecture (by embedding PTX code in the fatbinary) in order to use it. That way, the real architecture can be JIT (Just-In-Time) compiled for at runtime.

CUDA_SUGGESTED_PTX identifies the PTX architectures you need to run on newer (unrecognized) cards. You can choose to add them to your builds.

discover_cuda_versions
Output:

cuda_info.bsh CUDA_VERSIONS cuda_info.bsh CUDA_VERSION

Find CUDA development kits

Note

Will not work on macOS if it has NVIDIA and two or more versions of CUDA installed.

nvidia_smi_cuda_version

Starting at NVIDIA Driver version 410.72, nvidia-smi started listing the maximum version of CUDA that driver supports. This function can be used to parse that.

Parameters:
  • NVIDIA_SMI - Optional path to a specific nvidia-smi. Default: nvidia-smi (using the system path)

Output:

stdout - echoes out the version of CUDA that nvidia-smi says it supports. If this version of nvidia-smi does not say, nothing is output.

device_query_cuda_capability
Output:
Parameters:
  • DEVICE_QUERY - Optional path to a specific deviceQuery. Default: deviceQuery (using the system path)

Return Value:
  • 0 - No errors

  • Non-zero - If the output of deviceQuery does not contain any CUDA capabilities, then something likely failed. This is usually caused by running a contianer without the NVIDIA extensions or having an insufficient NVIDIA driver for the version of CUDA deviceQuery is compiled for.

Device query is one of the sample programs in the CUDA Samples that prints out useful information about the connected CUDA devices.

The deviceQuery executable is compiled from the source code typically found the in /usr/local/cuda/samples/1_Utilities/deviceQuery/, but can be downloaded precompiled for Linux from https://goo.gl/equvX3

nvidia_smi_cuda_capability

Starting at NVIDIA Driver version >500, nvidia-smi provides the compute_cap query property. This function can be used to parse that.

Output:
Parameters:
  • NVIDIA_SMI - Optional path to a specific nvidia-smi. Default: nvidia-smi (using the system path)

Return Value:
  • 0 - No errors

  • Non-zero - If the output of nvidia-smi does not contain any CUDA cards, then something likely failed.

wmic_cuda_capability
Output:
Parameters:
  • WMIC - Optional path to a specific wmic. Default: wmic.exe (using the system path)

Return Value:
  • 0 - No errors

  • Non-zero - If the output of wmic.exe does not contain any CUDA cards, then something likely failed.

nvidia_docker_plugin_cuda_capability

The deprecated nvidia-docker v1 API actually hosted GPU information, which was useful for determining the CUDA Arches. This is rarely used now.

Parameters:

[NV_HOST] - Environment variable to optionally specify a custom NVIDIA host for the GPUs. Default: http://localhost:3476

Output:
cuda_arch_to_cuda_family

Determines CUDA Family names based off of CUDA Arch stored in CUDA_CARD_ARCHES

Parameters:

cuda_info.bsh CUDA_CARD_ARCHES

Output:

cuda_info.bsh CUDA_CARD_FAMILIES

discover_cuda_info

Get CUDA info about each card

Output:
cuda_capabilities
Parameters:

$1 - The CUDA version to check against. This is typically the nvcc version, but the docs version should work too. In a pinch, CUDA version will work, to some extent.

Output:

Determine compiler capabilities for specific CDK

suggested_architectures
Output:

Calculate suggested architectures

cmake_cuda_flags
Parameters:
Output:

stdout - echoes out the value of the target_CUDA_architectures

Generate CUDA flags for CMake

Modern CMake installs include a FindCUDA.cmake script which calls the select_compute_arch.cmake script (https://goo.gl/uZvAjR). It uses a limited version of the tables that cuda_info.bsh uses and is prone to being out of date.

This function will calculate the suggested value of target_CUDA_architectures for CMake’s:

FindCUDA.cmake:select_compute_arch.cmake:CUDA_SELECT_NVCC_ARCH_FLAGS

You will need to find where this is used and set the variable accordingly.

Example

For example, PyTorch’s CMake contains:

CUDA_SELECT_NVCC_ARCH_FLAGS(NVCC_FLAGS_EXTRA $ENV{TORCH_CUDA_ARCH_LIST})

Setting the environment variable TORCH_CUDA_ARCH_LIST to the output of cmake_cuda_flags will result in using the desired CUDA architecture and code versions.

To add the cuda_info.bsh CUDA_FORWARD_PTX, run:

CUDA_SUGGESTED_PTX+=(${CUDA_FORWARD_PTX})

before calling cmake_cuda_flags

cmake_cuda_architectures
Parameters:
Output:

stdout - semi-colon delimited string of CUDA capabilities for cmake

Generate CUDA capabilities suitable for the CMAKE_CUDA_ARCHITECTURES environment variable. See cmake docs for more information https://cmake.org/cmake/help/latest/variable/CMAKE_CUDA_ARCHITECTURES.html

torch_cuda_arch_list
Parameters:
Output:

stdout - comma delimited string of CUDA architectures for pytorch

Generate CUDA capabilities suitable for the TORCH_CUDA_ARCH_LIST environment variable. See pytorch docs for more information https://pytorch.org/docs/stable/cpp_extension.html#torch.utils.cpp_extension.CUDAExtension

torch_cuda_arch_list_from_cmake
Arguments:
  • $1 - delimited string of cuda architectures

Output:

stdout - string containing TORCH_CUDA_ARCH_LIST statement

Create TORCH_CUDA_ARCH_LIST from CMAKE_CUDA_ARCHITECTURES statement.

Example

For example, an input of "80-real 86-virtual" would produce the output "8.0 8.6+PTX"

tcnn_cuda_architectures
Parameters:
Output:

stdout - comma delimited string of CUDA architectures for tiny-cuda-nn

Generate CUDA architectures for tiny-cuda-nn, suitable for the TCNN_CUDA_ARCHITECTURES environment variable. Note tiny-cuda-nn will always build both binaries and PTX intermediate code for each specified architecture.

tcnn_cuda_architectures_from_cmake
Arguments:
  • $1 - delimited string of cuda architectures

Output:

stdout - string containing TCNN_CUDA_ARCHITECTURES statement

Create TCNN_CUDA_ARCHITECTURES from CMAKE_CUDA_ARCHITECTURES statement.

Example

For example, an input of "80-real 86-virtual" would produce the output "80 86"

nvcc_gencodes
Arguments:
  • $1 - delimited string of cuda architectures

Output:

stdout - string containing nvcc gencode statements

Create nvcc gencode statements from input array of cuda capabilities.

Example

For example, an input of "75 86+PTX" would produce the output

-gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_86,code=compute_86
discover_cuda_all

Helper function to call all the discovery code in one call.

There are two methods for CUDA device discovery (in order):

  1. Using deviceQuery (available here https://goo.gl/ocBgPU)

  • looks for ${DEVICE_QUERY-deviceQuery} on the PATH

  1. Using the nvidia-docker-plugin to get and parse GPU information

  • discovered using either NV_HOST or checking to see if nvidia-docker-plugin is running locally using pgrep or ps

When running in a docker, deviceQuery is the preferred method. NV_HOST could be used, but that involves telling the docker the IP of the host, or using a shared network mode in order to use localhost (which is not recommended for production). Attempting to discover nvidia-docker-plugin will not work in a docker.

Output:

Note

If deviceQuery is not used, then an internal lookup table is used, but only supports Tesla, Fermi, Kepler, Maxwell, Pascal, Volta, Tesla, and Ampere. Additional family names need to be added as they are released.

has_gpu_device

Tells you if the computer has any NVIDIA GPUs. This is checking if they physically exist, not if they are currently working or have working drivers. This is usually used to determine if you need to install drivers, not if you have GPUs ready to use.

Return Value:
  • 0 - The machine has an NVIDIA GPU.

  • 1 - No NVIDIA GPUs were found.

See also

is_gpu_setup

is_gpu_setup

Checks to see if you have any GPUs installed and working.

Return Value:
  • 0 - There is at least one NVIDIA GPU available to use.

  • 1 - No NVIDIA GPUs were available.

  • If there is a problem with the NVIDIA drivers resulting in the kernel module not loading, then this check will return false.

  • If you are in a docker and did not route the GPU in correctly, it will return false.

See also

has_gpu_device