bpftrace

Trace and analyze Linux kernel and user programs

TLDR

List all available probes

$ sudo bpftrace -l

Run a one-liner program (e.g. syscall count by program)

$ sudo bpftrace -e '[tracepoint:raw_syscalls:sys_enter { @[comm] = count(); ]}'

Run a program from a file

$ sudo bpftrace [path/to/file]

Trace a program by PID

$ sudo bpftrace -e '[tracepoint:raw_syscalls:sys_enter /pid == 123/ { @[comm] = count(); ]}'

Do a dry run and display the output in eBPF format

$ sudo bpftrace -d -e '[one_line_program]'

Display version

$ bpftrace [[-V|--version]]

SYNOPSIS

bpftrace [options] program [args...]
bpftrace [options] -e 'program' [args...]

-e 'program'
    Execute the specified program string.
This allows one-liner scripts without saving them to a file.

-p PID
    Attach to a specific process ID for user-space probes (e.g., uprobes, USDT).

-c 'COMMAND'
    Execute COMMAND and trace only processes spawned by it.

-o FILE
    Write output to FILE instead of standard output.

-B SIZE
    Set the eBPF ring buffer size (in bytes or with suffix like 'K', 'M', 'G').
Defaults to 8MB.

-f FORMAT
    Specify output format (e.g., 'json', 'text').

-l [PROBE]
    List available probes. If PROBE is specified, list probes matching the pattern.

-L [PROBE]
    List available probes and their arguments. If PROBE is specified, list matching probes.

-v
    Enable verbose output, showing more details about compilation and execution.

-d
    Enable debug output, including dumped eBPF bytecode and more detailed internal logs.

-q
    Quiet mode, suppresses warnings.

-h
    Show help message and exit.

-V
    Display version information and exit.

DESCRIPTION

bpftrace is a versatile and efficient high-level tracing language designed for Linux, built upon the foundation of eBPF (extended Berkeley Packet Filter). It allows users to dynamically inspect and trace various aspects of kernel and user-space activity with minimal overhead.

Users write simple, C-like scripts that are then compiled into eBPF bytecode by the bpftrace compiler. This bytecode is then loaded into the Linux kernel and executed safely and efficiently.

The tool enables deep introspection into system behavior, such as tracking system calls, monitoring kernel function entry/exit, observing user-space function calls, and analyzing performance events. It's widely used for debugging performance issues, understanding system bottlenecks, and gaining insights into complex software interactions without modifying or recompiling applications. Its low overhead and powerful scripting capabilities make it an invaluable tool for system administrators, developers, and SREs.

CAVEATS

Requires a relatively recent Linux kernel (typically 4.x or 5.x+) due to its reliance on modern eBPF features.
Requires root privileges (specifically CAP_BPF, CAP_PERFMON, or CAP_SYS_ADMIN capability) to load eBPF programs into the kernel.
While designed for low overhead, inefficiently written scripts or tracing too many events can still impact system performance.
The eBPF JIT compiler must be enabled in the kernel for optimal performance.
Some advanced features may require kernel headers to be installed.

PROBE TYPES

bpftrace supports a wide array of probe types for tracing different system events:

kprobe/kretprobe: Attach to kernel function entry/exit points.
uprobe/uretprobe: Attach to user-space function entry/exit points in a process.
tracepoint: Attach to stable kernel tracepoints (e.g., `syscalls`, `sched`, `net`).
syscall: Shorthand for tracing system calls (e.g., `syscalls:sys_enter_openat`).
profile: Periodic sampling of stack traces (e.g., `profile:hz:99`, `profile:s:1`).
interval: Execute a script at regular time intervals (e.g., `interval:s:1`).
software/hardware: Access performance counter events (e.g., `software:cpu-clock`, `hardware:cache-misses`).
USDT: User-level Statically Defined Tracing, for applications with explicit tracepoints.
BEGIN/END: Special probes that run once at the start/end of the bpftrace script's execution.

SCRIPT STRUCTURE

A bpftrace script consists of one or more clauses, each defined by a `probe /filter/ { action }` structure:

probe: Specifies the event to attach to (e.g., `kprobe:do_sys_open`).
filter: An optional boolean expression that must evaluate to true for the `action` to execute (e.g., `/pid == 1234/`).
action: A block of C-like code enclosed in braces `{}` that executes when the probe fires and the filter passes. Actions can include printing, aggregating data into maps, and manipulating variables.

Example: `kprobe:do_sys_open { printf("Open called by PID %d\n", pid); }`

VARIABLES AND MAPS

bpftrace provides several types of variables and powerful data structures (maps) for aggregating and storing tracing data:

Built-in Variables: Predefined variables like `pid` (process ID), `comm` (command name), `tid` (thread ID), `uid` (user ID), `arg0-argN` (function arguments), `retval` (return value), etc.
Global Variables: User-defined variables with a leading `@` (e.g., `@my_var`). They are accessible across all probes.
Thread-Local Variables: User-defined variables with a leading `$` (e.g., `$my_thread_var`). They store data unique to each thread.
Associative Arrays (Maps): Key-value stores for aggregation, also prefixed with `@` (e.g., `@counts[comm] = count();`). Common aggregation functions include `count()`, `sum()`, `avg()`, `min()`, `max()`, `hist()` (histogram), `lhist()` (linear histogram). Maps can be printed at the `END` probe or on script exit.

HISTORY

bpftrace was primarily developed by Brendan Gregg, with contributions from others, as a high-level frontend for eBPF. It was first publicly released in late 2018.
Its design was heavily inspired by Sun's DTrace but specifically built for the Linux eBPF framework. This filled a critical gap by providing an intuitive, expressive language for dynamic tracing on Linux, making the powerful eBPF capabilities more accessible to system administrators and developers. It rapidly gained adoption as a go-to tool for deep system observability.