LinuxCommandLibrary

nvcc

TLDR

Compile CUDA program

$ nvcc [program.cu] -o [program]
copy
Compile to object file
$ nvcc -c [kernel.cu] -o [kernel.o]
copy
Compile with specific GPU architecture
$ nvcc -arch=sm_[75] [program.cu] -o [program]
copy
Generate PTX code
$ nvcc -ptx [kernel.cu]
copy
Compile with optimization
$ nvcc -O3 [program.cu] -o [program]
copy
Compile with debug symbols
$ nvcc -g -G [program.cu] -o [program]
copy
Link with external library
$ nvcc [program.cu] -o [program] -l[cublas]
copy
Show compilation stages
$ nvcc --dryrun [program.cu]
copy

SYNOPSIS

nvcc [-arch=arch] [-o output] [-c] [-g] [options] files

DESCRIPTION

nvcc is NVIDIA's CUDA compiler driver. It compiles CUDA C/C++ code that runs on NVIDIA GPUs along with host code that runs on the CPU.
Compilation separates device code (kernels running on GPU) from host code (CPU). Device code compiles to PTX intermediate representation or directly to SASS (GPU machine code).
Architecture flags (-arch) target specific GPU generations. Older architectures work on newer GPUs. Forward compatibility uses PTX that JIT-compiles at runtime.
The compiler integrates with host compilers (gcc, clang, MSVC) for CPU code. Separate compilation allows mixing CUDA with regular C++ in large projects.
Debug builds (-g -G) enable cuda-gdb debugging. Optimization levels affect both host and device code performance.
CUDA libraries (cuBLAS, cuDNN, cuFFT) link like regular libraries. Header paths and library paths may need specification for non-standard installations.

PARAMETERS

-o FILE

Output file.
-c
Compile only, don't link.
-arch ARCH
GPU architecture (sm50, sm75, sm_86, etc.).
-code CODE
GPU code generation.
-gencode SPEC
Architecture/code pair.
-ptx
Generate PTX assembly.
-g
Host debug symbols.
-G
Device debug symbols.
-O LEVEL
Optimization level (0-3).
-I DIR
Include directory.
-L DIR
Library directory.
-l LIB
Link library.
--dryrun
Show commands without executing.
-v, --verbose
Verbose output.
--version
Show version.

CAVEATS

Requires NVIDIA GPU and drivers. Architecture mismatch causes runtime errors. Debug builds much slower. Large register usage limits occupancy.

HISTORY

nvcc was introduced with CUDA by NVIDIA around 2007. It enabled general-purpose GPU computing by providing a C-like language for programming NVIDIA GPUs, transforming them from graphics-only to general computation.

SEE ALSO

cuda-gdb(1), nvidia-smi(1), gcc(1), clang(1)

Copied to clipboard