LinuxCommandLibrary

gawk

Process and manipulate text-based data

TLDR

Print the fifth column (a.k.a. field) in a space-separated file

$ gawk '{print $5}' [path/to/file]
copy

Print the second column of the lines containing "foo" in a space-separated file
$ gawk '/[foo]/ {print $2}' [path/to/file]
copy

Print the last column of each line in a file, using a comma (instead of space) as a field separator
$ gawk [[-F|--field-separator]] ',' '{print $NF}' [path/to/file]
copy

Sum the values in the first column of a file and print the total
$ gawk '{s+=$1} END {print s}' [path/to/file]
copy

Print every third line starting from the first line
$ gawk 'NR%3==1' [path/to/file]
copy

Print different values based on conditions
$ gawk '{if ($1 == "foo") print "Exact match foo"; else if ($1 ~ "bar") print "Partial match bar"; else print "Baz"}' [path/to/file]
copy

Print all the lines which the 10th column value is between a min and a max
$ gawk '($10 >= [min_value] && $10 <= [max_value])'
copy

Print table of users with UID >=1000 with header and formatted output, using colon as separator (%-20s mean: 20 left-align string characters, %6s means: 6 right-align string characters)
$ gawk 'BEGIN {FS=":";printf "%-20s %6s %25s\n", "Name", "UID", "Shell"} $4 >= 1000 {printf "%-20s %6d %25s\n", $1, $4, $7}' /etc/passwd
copy

SYNOPSIS

gawk [options] [-f progfile | 'program'] [files]

PARAMETERS

-F fs
    Set input field separator (FS variable).

-f file
    Read AWK program source from file.

-v var=val
    Assign val to var before program runs.

-b, --characters-as-bytes
    Treat strings as bytes, ignoring locales.

--posix
    Enforce POSIX-compatible behavior.

--traditional
    Disable GNU extensions for traditional AWK.

--lint[=value]
    Warn about dubious usage (fatal, warn, normal).

-mf n
    Max n forks in MP mode (gawk-specific).

-o[outfile]
    Dump parse tree to outfile.

--profile[=file]
    Profile program execution to file.

--sandbox
    Disable file access for security.

-V, --version
    Print version and exit.

-h, --help
    Print usage summary and exit.

--load ext
    Load extension ext at startup.

-e 'prog'
    Program source as argument (multiple ok).

DESCRIPTION

gawk is the GNU implementation of AWK, a pattern-matching language for data manipulation and reporting. It reads input line-by-line, splits into fields, tests patterns (often regex), and executes actions like printing reformatted data, calculations, or control flow.

Programs use syntax pattern { action }: patterns select lines (default: all), actions define operations. Special patterns BEGIN run before input, END after. Supports variables (e.g., FS for field separator, OFS for output), arrays (associative), built-in functions (substr, match, printf), and user functions.

Ideal for log analysis, CSV processing, report generation, sysadmin tasks. Processes stdin/files, outputs to stdout. GNU extensions include networking, i18n, extensions via --load. Portable but extensions vary across AWK impls.

CAVEATS

GNU extensions reduce portability; use --posix for standards. High memory use with large arrays/files. Multibyte locales may slow processing.

SPECIAL PATTERNS

BEGIN executes once before input; END once after; /regex/ matches lines.

BUILT-IN VARIABLES

NR (record number), NF (fields), $0 (whole line), $1..$n (fields).

HISTORY

AWK created 1977 by Aho, Weinberger, Kernighan at Bell Labs. gawk first released 1986 by GNU Project; maintained by Arnold Robbins since 1988, adding extensions like networking, profiling.

SEE ALSO

awk(1), mawk(1), nawk(1), sed(1), grep(1)

Copied to clipboard