nvcc

Compile CUDA code for NVIDIA GPUs

TLDR

Compile a CUDA program

$ nvcc [path/to/source.cu] [[-o|--output-file]] [path/to/executable]

Generate debug information

$ nvcc [path/to/source.cu] [[-o|--output-file]] [path/to/executable] [[-g|--debug]] [[-G|--device-debug]]

Include libraries from a different path

$ nvcc [path/to/source.cu] [[-o|--output-file]] [path/to/executable] [[-I|--include-path]] [path/to/includes] [[-L|--library-path]] [path/to/library] [[-l|--library]] [library_name]

Specify the compute capability for a specific GPU architecture

$ nvcc [path/to/source.cu] [[-o|--output-file]] [path/to/executable] [[-gencode|--generate-code]] arch=[arch_name],code=[gpu_code_name]

SYNOPSIS

nvcc [options] <source_file>... [-o <output_file>]

-o <file>
    Specify the name of the output file.

-c
    Compile and assemble only; do not link. Produces object files.

-arch=sm_XX
    Specify the target NVIDIA GPU architecture (e.g., sm_70 for Volta, sm_80 for Ampere).

-g
    Generate host debugging information (for CPU code).

-G
    Generate device debugging information (for GPU code). Enables profiling.

-I <dir>
    Add a directory to the list of paths searched for include files.

-L <dir>
    Add a directory to the list of paths searched for libraries.

-l <lib>
    Link with the specified library (e.g., -lcudart).

-Xcompiler <opts>
    Pass options directly to the host C++ compiler.

-Xlinker <opts>
    Pass options directly to the linker.

-Xptxas <opts>
    Pass options directly to the PTX optimizing assembler (ptxas).

-maxrregcount=N
    Specify the maximum number of registers that a GPU kernel can use.

-rdc=true
    Enable relocatable device code, required for device-side linking.

DESCRIPTION

nvcc is the NVIDIA CUDA Compiler driver, a powerful tool for compiling CUDA C/C++ source files. It acts as a wrapper around a host C++ compiler (like gcc or MSVC) and the NVIDIA PTX assembler/optimizing compiler (ptxas).

Its primary function is to orchestrate the entire compilation process, which involves separating host code from device code. It first extracts the device code, compiles it into PTX (Parallel Thread Execution) assembly or SASS (GPU assembly code), and then uses the host compiler to compile the remaining host code. Finally, it links all compiled components—host object files, device object files, and necessary CUDA runtime libraries—to produce an executable or a shared library.

nvcc supports various compilation targets, including different NVIDIA GPU architectures, and allows developers to optimize code for specific hardware. It is an essential tool for building high-performance computing applications that leverage the parallel processing capabilities of NVIDIA GPUs.

CAVEATS

Building CUDA applications can be complex due to the interplay between host and device compilation environments. Debugging GPU code often requires specialized tools (like NVIDIA Nsight). Compatibility between CUDA toolkit versions, GPU drivers, and target hardware architectures must be carefully managed. Cross-compilation for different CPU architectures (e.g., ARM) or GPU architectures also introduces additional complexities.

<B>CUDA COMPILATION WORKFLOW</B>

A typical nvcc compilation involves several stages:
1. Preprocessing: Handles directives like #include.
2. CUDA Compilation: nvcc extracts device code and compiles it to PTX (virtual assembly) or SASS (native GPU assembly). This may involve multiple PTX generations for different architectures.
3. Host Compilation: The remaining host code is compiled by the underlying host C++ compiler (e.g., gcc).
4. Linking: Host object files, device object files, and necessary CUDA runtime libraries are linked to form the final executable or library.

<B>JUST-IN-TIME (JIT) COMPILATION</B>

If nvcc includes PTX code for a newer architecture not supported by the current driver, the driver can perform JIT compilation. The PTX code is compiled to SASS at runtime on the target GPU, allowing binaries compiled for older architectures to run on newer hardware, albeit potentially with a slight performance overhead during the first kernel launch.

HISTORY

nvcc emerged with the introduction of NVIDIA's CUDA platform in 2007, marking a pivotal shift in general-purpose GPU computing. Prior to CUDA, GPGPU programming was often done using graphics APIs. nvcc provided a more familiar C/C++ development environment for parallel programming, abstracting away much of the low-level GPU details. Since its inception, nvcc has evolved significantly with each new generation of NVIDIA GPUs and CUDA toolkit releases, adding support for new architectural features, programming models (e.g., dynamic parallelism, cooperative groups), and compilation optimizations, solidifying its role as the cornerstone of CUDA development.