nvcc
TLDR
Compile CUDA program
SYNOPSIS
nvcc [-arch=arch] [-o output] [-c] [-g] [options] files
DESCRIPTION
nvcc is NVIDIA's CUDA compiler driver. It compiles CUDA C/C++ code that runs on NVIDIA GPUs along with host code that runs on the CPU.
Compilation separates device code (kernels running on GPU) from host code (CPU). Device code compiles to PTX intermediate representation or directly to SASS (GPU machine code).
Architecture flags (-arch) target specific GPU generations. Older architectures work on newer GPUs. Forward compatibility uses PTX that JIT-compiles at runtime.
The compiler integrates with host compilers (gcc, clang, MSVC) for CPU code. Separate compilation allows mixing CUDA with regular C++ in large projects.
Debug builds (-g -G) enable cuda-gdb debugging. Optimization levels affect both host and device code performance.
CUDA libraries (cuBLAS, cuDNN, cuFFT) link like regular libraries. Header paths and library paths may need specification for non-standard installations.
PARAMETERS
-o FILE
Output file.-c
Compile only, don't link.-arch ARCH
GPU architecture (sm50, sm75, sm_86, etc.).-code CODE
GPU code generation.-gencode SPEC
Architecture/code pair.-ptx
Generate PTX assembly.-g
Host debug symbols.-G
Device debug symbols.-O LEVEL
Optimization level (0-3).-I DIR
Include directory.-L DIR
Library directory.-l LIB
Link library.--dryrun
Show commands without executing.-v, --verbose
Verbose output.--version
Show version.
CAVEATS
Requires NVIDIA GPU and drivers. Architecture mismatch causes runtime errors. Debug builds much slower. Large register usage limits occupancy.
HISTORY
nvcc was introduced with CUDA by NVIDIA around 2007. It enabled general-purpose GPU computing by providing a C-like language for programming NVIDIA GPUs, transforming them from graphics-only to general computation.
SEE ALSO
cuda-gdb(1), nvidia-smi(1), gcc(1), clang(1)


