LinuxCommandLibrary

perf

Profile Linux kernel and user space performance

TLDR

Display basic performance counter stats for a command

$ perf stat [gcc hello.c]
copy

Display system-wide real-time performance counter profile
$ sudo perf top
copy

Run a command and record its profile into perf.data
$ sudo perf record [command]
copy

Record the profile of an existing process into perf.data
$ sudo perf record [[-p|--pid]] [pid]
copy

Read perf.data (created by perf record) and display the profile
$ sudo perf report
copy

SYNOPSIS

perf command [options]

PARAMETERS

record
    Records performance data based on specified events and options.

stat
    Runs a command and gathers summary statistics about its execution.

top
    Displays real-time performance data, similar to the 'top' command, but focused on hardware counters.

report
    Generates a report from a perf.data file created by the 'record' command.

annotate
    Annotates source code or assembly with performance data.

list
    Lists available performance events.

-a
    System-wide collection from all CPUs.

-p pid
    Monitors a specific process by its process ID.

-e event
    Specifies the performance event to monitor (e.g., cycles, cache-misses).

-g
    Enables call-graph recording.

-o file
    Specifies the output file for recorded data.

--call-graph mode
    Specifies the call graph recording mode (e.g., fp, dwarf).

DESCRIPTION

perf is a powerful performance monitoring tool in Linux.

It provides a comprehensive way to analyze the performance of applications and the operating system itself. Perf works by sampling system events, such as CPU cycles, cache misses, branch mispredictions, and system calls. It then aggregates these samples to provide insights into where time is being spent and what bottlenecks exist. This information can be used to optimize code, identify performance regressions, and understand how applications interact with the underlying hardware.

Perf supports various profiling modes, including CPU profiling, tracing, and hardware event counting. It can generate detailed reports, call graphs, and visualizations to help developers pinpoint performance issues. It provides a command-line interface and requires root privileges (or specific capabilities) for many of its functions. It is considered a cornerstone tool for Linux performance analysis and debugging.

CAVEATS

Requires root privileges or `kernel.perf_event_paranoid` setting modifications for many functionalities. The overhead of recording can impact the performance of the target application. Interpretation of results requires a solid understanding of hardware architecture and system behavior.

EVENT SELECTION

Choosing the right events to monitor is crucial for effective performance analysis. The `perf list` command shows all available events, which can be broadly categorized into hardware events, software events, and tracepoint events. Hardware events are directly related to CPU and memory operations. Software events are related to kernel operations. Tracepoint events are probes inserted into the kernel code, allowing specific function calls to be monitored.

DATA INTERPRETATION

Interpreting perf data requires understanding of system architecture and the monitored application. High values for certain events (e.g., cache misses, branch mispredictions) can indicate performance bottlenecks. The `perf report` and `perf annotate` commands provide insights into specific code sections contributing to the identified bottlenecks. Correlation with source code is essential for identifying optimization opportunities.

HISTORY

perf evolved from earlier performance analysis tools like OProfile and was integrated into the Linux kernel to provide a more unified and efficient mechanism for performance monitoring. Its development has been ongoing, with continuous improvements in event support, analysis capabilities, and user interface. Initially, its use was mostly confined to kernel developers, but it has since become more widely adopted by application developers and system administrators for general performance tuning.

SEE ALSO

top(1), vmstat(8), strace(1), oprofile(1)

Copied to clipboard