nvcc
NVIDIA's CUDA compiler driver
TLDR
SYNOPSIS
nvcc [-arch=arch] [-o output] [-c] [-g] [options] files
DESCRIPTION
nvcc is NVIDIA's CUDA compiler driver. It compiles CUDA C/C++ code that runs on NVIDIA GPUs along with host code that runs on the CPU.Compilation separates device code (kernels running on GPU) from host code (CPU). Device code compiles to PTX intermediate representation or directly to SASS (GPU machine code).Architecture flags (-arch) target specific GPU generations. Use `-arch=native` to auto-detect visible GPUs, or `-arch=all` to compile for all supported architectures. Forward compatibility uses PTX that JIT-compiles at runtime.The compiler integrates with host compilers (gcc, clang, MSVC) for CPU code. Separate compilation allows mixing CUDA with regular C++ in large projects.Debug builds (-g -G) enable cuda-gdb debugging. Optimization levels affect both host and device code performance.CUDA libraries (cuBLAS, cuDNN, cuFFT) link like regular libraries. Header paths and library paths may need specification for non-standard installations.
PARAMETERS
-o FILE
Output file.-c
Compile only, don't link.-arch ARCH
GPU architecture (sm50, sm75, sm_86, etc.).-code CODE
GPU code generation.-gencode SPEC
Architecture/code pair (e.g., arch=compute75,code=sm75).-ptx
Generate PTX assembly.-g
Host debug symbols.-G
Device debug symbols.-O LEVEL
Optimization level (0-3).-I DIR
Include directory.-L DIR
Library directory.-l LIB
Link library.--dryrun
Show commands without executing.-Xcompiler options
Pass options directly to the host compiler.-std standard
C++ standard (e.g., c++14, c++17, c++20). Also accepted as `--std`.-dc
Compile to relocatable device code (enables separate compilation).-rdc true|false
Enable or disable relocatable device code.-dlink
Link relocatable device code objects.-ccbin PATH
Specify the host compiler binary (e.g., `/usr/bin/g++`).-Xlinker options
Pass options directly to the host linker.-lineinfo
Generate line-number information for device code (useful for profilers).-use_fast_math
Enable fast math optimizations (implies `-ftz=true -prec-div=false -prec-sqrt=false`).-keep
Retain intermediate compilation files.-t N
Parallelize compilation across N threads.-v, --verbose
Verbose output.--version
Show version.
CAVEATS
Requires NVIDIA GPU and drivers. Architecture mismatch causes runtime errors. Debug builds much slower. Large register usage limits occupancy.
HISTORY
nvcc was introduced with CUDA by NVIDIA around 2007. It enabled general-purpose GPU computing by providing a C-like language for programming NVIDIA GPUs, transforming them from graphics-only to general computation.
SEE ALSO
nvidia-smi(1), cuda-gdb(1), gcc(1), clang(1)
