================
CUDA Information
================

.. default-domain:: bash

.. file:: cuda_info.bsh

Determine easy to use capabilities for CUDA devices

There are many versions of CUDA, CUDA card architectures, etc... Knowing how to compile for a specific card is hard enough, but it's very difficult to know what the right architectures are for your specific card and what the limitations are based on your version of CUDA/NVIDIA driver, etc. This script should help determine what versions you have and suggest what architectures to use. This is good enough for an automated solution to get up and running, but these suggestions are not absolute. You may find a fine-tuned configuration that works better on a case-by-case basis.


A number of variables are set by the following functions.

.. var:: CUDA_VERSION

The version of CUDA being used. These functions will attempt to discover the CUDA Toolkit in commonly known locations and gather a list of all discovered CUDAs in the sorted array :var:`CUDA_VERSIONS`. Then, the highest capable version of CUDA is picked and set to :var:`CUDA_VERSION`.

:var:`CUDA_VERSION` can optionally be set to a specific version (e.g. "7.5.13"), in which case other CUDA versions will not be discovered and :var:`CUDA_VERSIONS` will not be populated.

.. note::

   Currently, CUDA Toolkits are discovered by checking the system ``PATH`` and ``/usr/local/cuda*/bin/`` directories for the nvcc executable. More paths should be added to this file as they become necessary.

.. var:: CUDA_VERSIONS

List of all :var:`CUDA_VERSION`-s discovered

.. var:: CUDA_ARCHES

An array of CUDA "virtual" instruction sets supported by CUDA. Every version of CUDA (``nvcc``) has a set of "virtual" ``compute_xx`` architectures (ISAs) that it can build against when compiling code for "real" ``sm_xx`` architectures.

This array contains the list of the compute (virtual) architectures supported by the :var:`CUDA_VERSION` version of CUDA as an array of two digit numbers.

.. rubric:: Example

.. code-block:: bash

   $ echo "${CUDA_ARCHES[@]}"
   20 30 32 35 37 50 52 53 60 61 62

   Adding the periods to the architecture version number:

   y=()
   for x in ${CUDA_ARCHES[@]+"${CUDA_ARCHES[@]}"}; do
     y+=("${x:0:${#x}-1}.${x:${#x}-1:1}")
   done

   $ echo "${y[@]}"
   2.0 3.0 3.2 3.5 3.7 5.0 5.2 5.3 6.0 6.1 6.2

.. seealso::
  :var:`CUDA_DEPRECATED`

.. var:: CUDA_CODES

An array of CUDA "real" instruction sets supported by CUDA. Every version of CUDA (``nvcc``) has a set of "real" ``sm_xx`` architectures that that it can assemble native (CUDA binary) code for.

This array contains a list of the sm architectures supported by the :var:`CUDA_VERSION` version of CUDA as an array of two digit numbers.

.. seealso::
  :var:`CUDA_DEPRECATED`

.. var:: CUDA_LTOS

Starting in CUDA 11, NVIDIA introduced Link Time Optimizations (``lto_``) architectures in addition to the already existing ``sm_`` and ``compute_``

Separately compiled kernels may not have as high of performance as if they were compiled together with the rest of the executable code because of the inability to inline code across files. Link-time optimization is a way to recover that improved performance

.. var:: CUDA_MINIMAL_DRIVER_VERSION

Every version of CUDA has a minimal version of the NVIDIA graphics-card driver that must be installed in order to support that version of CUDA. This was largely undocumented until CUDA 10 came out, despite being obviously important. This variable is set to the minimum required version of the NVIDIA driver for the :var:`CUDA_VERSION` version of CUDA, as best as we've been able to determine.

.. var:: CUDA_COMPATIBLE_DRIVER_VERSION

Starting in CUDA 11, NVIDIA introduced a ``cuda-compat` package that can be installed on machines with older NVIDIA Drivers for increased compatibility with these drivers. This is the minimum supported driver version that will handle the ``cuda-compat`` runtime.

.. var:: CUDA_DEPRECATED

Some versions of CUDA support old instruction sets, but print a deprecated warning. For those versions of CUDA, a :var:`CUDA_DEPRECATED` array is defined to list the two digit architectures that are supported yet deprecated in the :var:`CUDA_VERSION` version of CUDA.

.. var:: CUDA_FORWARD_PTX

PTX arch for newer CUDA cards. In situations where you are making portable fatbinaries, you should compile for every architecture. However, in order to future proof your fatbin for architectures newer than your current version of CUDA supports, you will need to compile to a pure virtual architecture using the PTX feature so that the real architecture can be JIT (Just-In-Time) compiled.

:var:`CUDA_FORWARD_PTX` identifies the fullest-featured PTX architecture so that you can choose to add this to your builds.

.. var:: CUDA_CARDS

The names of the CUDA cards. Exact values may vary, e.g. "Titan X (Pascal)" vs "Titan Xp".

.. var:: CUDA_CARD_FAMILIES

  The family name of each card in :var:`CUDA_CARDS`.

.. var:: CUDA_CARD_ARCHES

The specific CUDA architecture a card natively supports, for each card in :var:`CUDA_CARDS`.

.. var:: CUDA_SUGGESTED_ARCHES

Suggested "virtual" architectures to compile for. Instead of compiling for every architecture that the :var:`CUDA_VERSION` version of CUDA supports, :var:`CUDA_SUGGESTED_ARCHES` is the intersection between :var:`CUDA_CARD_ARCHES` and :var:`CUDA_ARCHES` so that you compile only for your cards.

.. var:: CUDA_SUGGESTED_CODES

Suggested "real" architectures to compile for. Instead of compiling for every architecture that the :var:`CUDA_VERSION` version of CUDA supports, :var:`CUDA_SUGGESTED_CODES` is the intersection between :var:`CUDA_CARD_ARCHES` and :var:`CUDA_CODES` so that you compile only for your cards.

.. var:: CUDA_SUGGESTED_PTX

Suggested PTX architectures to compile for. If your graphics card is too new for the :var:`CUDA_VERSION` version of CUDA, you will need to compile to a pure virtual architecture (by embedding PTX code in the fatbinary) in order to use it. That way, the real architecture can be JIT (Just-In-Time) compiled for at runtime.

:var:`CUDA_SUGGESTED_PTX` identifies the PTX architectures you need to run on newer (unrecognized) cards. You can choose to add them to your builds.

.. function:: discover_cuda_versions

:Output: * :var:`cuda_info.bsh CUDA_VERSIONS`
         * :var:`cuda_info.bsh CUDA_VERSION`

Find CUDA development kits

.. note::

   Will not work on macOS if it has NVIDIA and two or more versions of CUDA installed.

.. function:: nvidia_smi_cuda_version

Starting at NVIDIA Driver version 410.72, ``nvidia-smi`` started listing the maximum version of CUDA that driver supports. This function can be used to parse that.

:Parameters: * ``NVIDIA_SMI`` - Optional path to a specific ``nvidia-smi``. Default: ``nvidia-smi`` (using the system path)

:Output: *stdout* - echoes out the version of CUDA that ``nvidia-smi`` says it supports. If this version of ``nvidia-smi`` does not say, nothing is output.

.. function:: device_query_cuda_capability

:Output: * :var:`cuda_info.bsh CUDA_CARD_FAMILIES`
         * :var:`cuda_info.bsh CUDA_CARD_ARCHES`
         * :var:`cuda_info.bsh CUDA_CARDS`

:Parameters: * ``DEVICE_QUERY`` - Optional path to a specific ``deviceQuery``. Default: ``deviceQuery`` (using the system path)

:Return Value: * ``0`` - No errors
               * Non-zero - If the output of ``deviceQuery`` does not contain any CUDA capabilities, then something likely failed. This is usually caused by running a contianer without the NVIDIA extensions or having an insufficient NVIDIA driver for the version of CUDA ``deviceQuery`` is compiled for.

Device query is one of the sample programs in the CUDA Samples that prints out useful information about the connected CUDA devices.

The ``deviceQuery`` executable is compiled from the source code typically found the in ``/usr/local/cuda/samples/1_Utilities/deviceQuery/``, but can be downloaded precompiled for Linux from https://goo.gl/equvX3

.. function:: nvidia_smi_cuda_capability

Starting at NVIDIA Driver version >500, ``nvidia-smi`` provides the ``compute_cap`` query property. This function can be used to parse that.

:Output: * :var:`cuda_info.bsh CUDA_CARD_FAMILIES`
         * :var:`cuda_info.bsh CUDA_CARD_ARCHES`
         * :var:`cuda_info.bsh CUDA_CARDS`

:Parameters: * ``NVIDIA_SMI`` - Optional path to a specific ``nvidia-smi``. Default: ``nvidia-smi`` (using the system path)

:Return Value: * ``0`` - No errors
               * Non-zero - If the output of ``nvidia-smi`` does not contain any CUDA cards, then something likely failed.

.. function:: wmic_cuda_capability

:Output: * :var:`cuda_info.bsh CUDA_CARDS`

:Parameters: * ``WMIC`` - Optional path to a specific ``wmic``. Default: ``wmic.exe`` (using the system path)

:Return Value: * ``0`` - No errors
               * Non-zero - If the output of ``wmic.exe`` does not contain any CUDA cards, then something likely failed.

.. function:: nvidia_docker_plugin_cuda_capability

The deprecated nvidia-docker v1 API actually hosted GPU information, which was useful for determining the CUDA Arches. This is rarely used now.

:Parameters: [``NV_HOST``] - Environment variable to optionally specify a custom NVIDIA host for the GPUs. Default: ``http://localhost:3476``

:Output: * :var:`cuda_info.bsh CUDA_CARD_FAMILIES`
         * :var:`cuda_info.bsh CUDA_CARD_ARCHES`
         * :var:`cuda_info.bsh CUDA_CARDS`

.. function:: cuda_arch_to_cuda_family

Determines CUDA Family names based off of CUDA Arch stored in ``CUDA_CARD_ARCHES``

:Parameters: :var:`cuda_info.bsh CUDA_CARD_ARCHES`

:Output: :var:`cuda_info.bsh CUDA_CARD_FAMILIES`

.. function:: discover_cuda_info

Get CUDA info about each card

:Output: * :var:`cuda_info.bsh CUDA_CARDS`
         * :var:`cuda_info.bsh CUDA_CARD_ARCHES`
         * :var:`cuda_info.bsh CUDA_CARD_FAMILIES`

.. function:: cuda_capabilities

:Parameters: ``$1`` - The CUDA version to check against. This is typically the ``nvcc`` version, but the docs version should work too. In a pinch, CUDA version will work, to some extent.

:Output: * :var:`cuda_info.bsh CUDA_ARCHES`
         * :var:`cuda_info.bsh CUDA_CODES`
         * :var:`cuda_info.bsh CUDA_LTOS` - Starting in CUDA 11, Link-Time Optimization architectures were added
         * :var:`cuda_info.bsh CUDA_MINIMAL_DRIVER_VERSION` - Minimal required NVIDIA Driver needed to run this version of CUDA
         * :var:`cuda_info.bsh CUDA_COMPATIBLE_DRIVER_VERSION` - Starting in CUDA 10, you can install a ``cuda-compat`` package that supports a different version of the CUDA library that runs on older drivers up to that version
         * :var:`cuda_info.bsh CUDA_DEPRECATED`

Determine compiler capabilities for specific CDK

.. function:: suggested_architectures

:Output: * :var:`cuda_info.bsh CUDA_SUGGESTED_ARCHES`
         * :var:`cuda_info.bsh CUDA_SUGGESTED_CODES`
         * :var:`cuda_info.bsh CUDA_SUGGESTED_PTX`
         * :var:`cuda_info.bsh CUDA_FORWARD_PTX`

Calculate suggested architectures

.. function:: cmake_cuda_flags

:Parameters: * :var:`cuda_info.bsh CUDA_SUGGESTED_ARCHES`
             * :var:`cuda_info.bsh CUDA_SUGGESTED_CODES`
             * [:var:`cuda_info.bsh CUDA_SUGGESTED_PTX`]
:Output: *stdout* - echoes out the value of the target_CUDA_architectures

Generate CUDA flags for CMake

Modern CMake installs include a FindCUDA.cmake script which calls the select_compute_arch.cmake script (https://goo.gl/uZvAjR). It uses a limited version of the tables that :file:`cuda_info.bsh` uses and is prone to being out of date.

This function will calculate the suggested value of target_CUDA_architectures for CMake's:

  FindCUDA.cmake:select_compute_arch.cmake:CUDA_SELECT_NVCC_ARCH_FLAGS

You will need to find where this is used and set the variable accordingly.

.. rubric:: Example

For example, PyTorch's CMake contains:

.. code-block:: bash

  CUDA_SELECT_NVCC_ARCH_FLAGS(NVCC_FLAGS_EXTRA $ENV{TORCH_CUDA_ARCH_LIST})

Setting the environment variable ``TORCH_CUDA_ARCH_LIST`` to the output of :func:`cmake_cuda_flags` will result in using the desired CUDA architecture and code versions.

To add the :var:`cuda_info.bsh CUDA_FORWARD_PTX`, run:

.. code-block:: bash

  CUDA_SUGGESTED_PTX+=(${CUDA_FORWARD_PTX})

before calling :func:`cmake_cuda_flags`

.. function:: cmake_cuda_architectures

:Parameters: * :var:`cuda_info.bsh CUDA_SUGGESTED_ARCHES`
             * [:var:`cuda_info.bsh CUDA_SUGGESTED_PTX`]
:Output: *stdout* - semi-colon delimited string of CUDA capabilities for cmake

Generate CUDA capabilities suitable for the ``CMAKE_CUDA_ARCHITECTURES`` environment variable.
See cmake docs for more information https://cmake.org/cmake/help/latest/variable/CMAKE_CUDA_ARCHITECTURES.html


.. function:: torch_cuda_arch_list

:Parameters: * :var:`cuda_info.bsh CUDA_SUGGESTED_ARCHES`
             * [:var:`cuda_info.bsh CUDA_SUGGESTED_PTX`]
:Output: *stdout* - comma delimited string of CUDA architectures for pytorch

Generate CUDA capabilities suitable for the ``TORCH_CUDA_ARCH_LIST`` environment variable.
See pytorch docs for more information https://pytorch.org/docs/stable/cpp_extension.html#torch.utils.cpp_extension.CUDAExtension


.. function:: torch_cuda_arch_list_from_cmake

:Arguments: * ``$1`` - delimited string of cuda architectures
:Output: *stdout* - string containing ``TORCH_CUDA_ARCH_LIST`` statement

Create ``TORCH_CUDA_ARCH_LIST`` from ``CMAKE_CUDA_ARCHITECTURES`` statement.

.. rubric:: Example

For example, an input of ``"80-real 86-virtual"`` would produce the output ``"8.0 8.6+PTX"``

.. function:: tcnn_cuda_architectures

:Parameters: * :var:`cuda_info.bsh CUDA_SUGGESTED_ARCHES`
             * [:var:`cuda_info.bsh CUDA_SUGGESTED_PTX`]
:Output: *stdout* - comma delimited string of CUDA architectures for tiny-cuda-nn

Generate CUDA architectures for tiny-cuda-nn, suitable for the ``TCNN_CUDA_ARCHITECTURES`` environment variable. Note tiny-cuda-nn will always build both binaries and PTX intermediate code for each specified architecture.


.. function:: tcnn_cuda_architectures_from_cmake

:Arguments: * ``$1`` - delimited string of cuda architectures
:Output: *stdout* - string containing ``TCNN_CUDA_ARCHITECTURES`` statement

Create ``TCNN_CUDA_ARCHITECTURES`` from ``CMAKE_CUDA_ARCHITECTURES`` statement.

.. rubric:: Example

For example, an input of ``"80-real 86-virtual"`` would produce the output ``"80 86"``

.. function:: nvcc_gencodes

:Arguments: * ``$1`` - delimited string of cuda architectures
:Output: *stdout* - string containing nvcc gencode statements

Create nvcc gencode statements from input array of cuda capabilities.

.. rubric:: Example

For example, an input of ``"75 86+PTX"`` would produce the output

.. code-block:: bash

  -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_86,code=compute_86


.. function:: discover_cuda_all

Helper function to call all the discovery code in one call.

There are two methods for CUDA device discovery (in order):

1. Using deviceQuery (available here https://goo.gl/ocBgPU)

  * looks for ${DEVICE_QUERY-deviceQuery} on the PATH

2. Using the nvidia-docker-plugin to get and parse GPU information

  * discovered using either ``NV_HOST`` or checking to see if nvidia-docker-plugin is running locally using pgrep or ps

When running in a docker, deviceQuery is the preferred method. ``NV_HOST`` could be used, but that involves telling the docker the IP of the host, or using a shared network mode in order to use localhost (which is not recommended for production). Attempting to discover nvidia-docker-plugin will not work in a docker.

:Output: * :var:`cuda_info.bsh CUDA_VERSIONS`
         * :var:`cuda_info.bsh CUDA_VERSION`
         * :var:`cuda_info.bsh CUDA_CARDS`
         * :var:`cuda_info.bsh CUDA_CARD_ARCHES`
         * :var:`cuda_info.bsh CUDA_CARD_FAMILIES`
         * :var:`cuda_info.bsh CUDA_ARCHES`
         * :var:`cuda_info.bsh CUDA_CODES`
         * :var:`cuda_info.bsh CUDA_LTOS`
         * :var:`cuda_info.bsh CUDA_MINIMAL_DRIVER_VERSION`
         * :var:`cuda_info.bsh CUDA_COMPATIBLE_DRIVER_VERSION`
         * :var:`cuda_info.bsh CUDA_DEPRECATED`
         * :var:`cuda_info.bsh CUDA_SUGGESTED_ARCHES`
         * :var:`cuda_info.bsh CUDA_SUGGESTED_CODES`
         * :var:`cuda_info.bsh CUDA_SUGGESTED_PTX`
         * :var:`cuda_info.bsh CUDA_FORWARD_PTX`

.. note::

   If deviceQuery is not used, then an internal lookup table is used, but only supports Tesla, Fermi, Kepler, Maxwell, Pascal, Volta, Tesla, and Ampere. Additional family names need to be added as they are released.

.. function:: has_gpu_device

Tells you if the computer has any NVIDIA GPUs. This is checking if they physically exist, not if they are currently working or have working drivers. This is usually used to determine if you need to install drivers, not if you have GPUs ready to use.

:Return Value: * ``0`` - The machine has an NVIDIA GPU.
               * ``1`` - No NVIDIA GPUs were found.

.. seealso::
  :func:`is_gpu_setup`

.. function:: is_gpu_setup

Checks to see if you have any GPUs installed and working.

:Return Value: * ``0`` - There is at least one NVIDIA GPU available to use.
               * ``1`` - No NVIDIA GPUs were available.

- If there is a problem with the NVIDIA drivers resulting in the kernel module not loading, then this check will return false.
- If you are in a docker and did not route the GPU in correctly, it will return false.

.. seealso::
  :func:`has_gpu_device`