LinuxCommandLibrary

ncu

Profile CUDA applications on NVIDIA GPUs

TLDR

List outdated dependencies in the current directory

$ ncu
copy

List outdated global npm packages
$ ncu --global
copy

Upgrade all dependencies in the current directory
$ ncu --upgrade
copy

Interactively upgrade dependencies in the current directory
$ ncu --interactive
copy

List outdated dependencies up to the highest minor version
$ ncu --target [minor]
copy

List outdated dependencies that match a keyword or regex
$ ncu --filter [keyword|/regex/]
copy

List only a specific section of outdated dependencies
$ ncu --dep [dev|optional|peer|prod|packageManager]
copy

Display help
$ ncu --help
copy

SYNOPSIS

ncu [options] application [application arguments]

PARAMETERS

-o file
    Specifies the output report file name. The default extension is '.ncu-rep'.

--set set_name
    Collects a predefined set of metrics. Common sets include 'full', 'basic', 'memory', and 'compute'.

--metrics metrics_list
    A comma-separated list of specific metrics to collect. Overrides '--set'.

--kernel-name kernel_regex
    Profiles only kernels whose names match the given regular expression.

--replay-mode mode
    Configures how kernel launches are replayed for metric collection. Modes include 'kernel' (default, per-kernel replay) and 'application' (replays the entire application).

--export format
    Exports the profiling data into a specified format like 'csv', 'html', or 'sqlite'. Defaults to '.ncu-rep'.

--device id
    Specifies the target GPU device to profile by its ID.

--target-processes mode
    Controls profiling behavior for child processes. Modes include 'all', 'none', 'host', 'cuda'.

--cpu-sampling-mode mode
    Enables and configures CPU instruction sampling, e.g., 'none', 'host-thread-id'.

--help
    Displays help information for the command or a specific option.

DESCRIPTION

ncu (NVIDIA CUDA Unified Profiler) is a powerful command-line tool designed for detailed performance analysis of CUDA applications. It allows developers to collect and analyze a wide range of metrics related to GPU utilization, memory access patterns, kernel execution, and API overhead. By identifying performance bottlenecks and inefficient code sections, ncu helps optimize CUDA applications for maximum efficiency on NVIDIA GPUs. It is the successor to nvprof for detailed GPU profiling and forms a core component of the NVIDIA Nsight Compute ecosystem. ncu supports profiling specific kernels or entire application runs, and can generate comprehensive reports in various formats for in-depth investigation and visualization.

CAVEATS

ncu requires an NVIDIA GPU and the CUDA Toolkit to function.
Profiling can introduce significant overhead to application execution, especially with extensive metric collection.
Analyzing large-scale applications or collecting many metrics can generate very large report files.
Understanding the collected metrics and interpreting the results often requires a good grasp of CUDA programming and GPU architecture.

OUTPUT FORMATS

Beyond the default .ncu-rep binary format, ncu can export profiling data into several user-friendly formats using the --export option:
csv: For tabular data analysis in spreadsheets.
html: A standalone HTML report for easy viewing in a web browser.
sqlite: A database format for programmatic access and custom queries.
These options facilitate integration with other analysis tools and custom visualization scripts.

METRIC COLLECTION SETS

ncu provides predefined sets of metrics for common profiling scenarios, simplifying the collection process:
basic: A small set of essential metrics for quick overview.
full: A comprehensive set of all available metrics, useful for deep dives.
memory: Metrics focused on memory access patterns and bandwidth.
compute: Metrics related to arithmetic operations and compute unit utilization.
Users can also specify individual metrics using the --metrics option for highly customized profiling.

HISTORY

ncu is the direct successor to nvprof, which was the primary command-line profiler for CUDA applications for many years. With the introduction of the Nsight Compute suite, ncu was developed to offer more advanced features, deeper insights into GPU hardware units, and a more robust metric collection system. It leverages a modern architecture that allows for more flexible and detailed profiling, making it the recommended tool for GPU-specific performance analysis within the NVIDIA ecosystem.

SEE ALSO

nvidia-smi(1), nsight-systems(1), cuda-gdb(1), nvprof(1) (legacy)

Copied to clipboard