LinuxCommandLibrary

awk

Process and transform text-based data

TLDR

Print the fifth column (a.k.a. field) in a space-separated file

$ awk '{print $5}' [path/to/file]
copy

Print the second column of the lines containing "foo" in a space-separated file
$ awk '/[foo]/ {print $2}' [path/to/file]
copy

Print the last column of each line in a file, using a comma (instead of space) as a field separator
$ awk -F ',' '{print $NF}' [path/to/file]
copy

Sum the values in the first column of a file and print the total
$ awk '{s+=$1} END {print s}' [path/to/file]
copy

Print every third line starting from the first line
$ awk 'NR%3==1' [path/to/file]
copy

Print different values based on conditions
$ awk '{if ($1 == "foo") print "Exact match foo"; else if ($1 ~ "bar") print "Partial match bar"; else print "Baz"}' [path/to/file]
copy

Print all the lines which the 10th column value is between a min and a max
$ awk '($10 >= [min_value] && $10 <= [max_value])'
copy

Print table of users with UID >=1000 with header and formatted output, using colon as separator (%-20s mean: 20 left-align string characters, %6s means: 6 right-align string characters)
$ awk 'BEGIN {FS=":";printf "%-20s %6s %25s\n", "Name", "UID", "Shell"} $4 >= 1000 {printf "%-20s %6d %25s\n", $1, $4, $7}' /etc/passwd
copy

SYNOPSIS

awk [-F fs] [-v var=val] [-f progfile] [--posix] [--] 'program' [files]

PARAMETERS

-F fs
    Field separator (fs); default whitespace or TAB

-f file
    Read awk program from file instead of command line

-v var=val
    Assign val to var before program runs (repeatable)

--
    End options; treat following as filenames

-V
    Print version and exit (gawk)

--help
    Print help and exit (gawk)

--posix
    Enforce POSIX compatibility (gawk)

-mf n
    Limit function args to n (debugging; gawk)

-mr n
    Limit record size to n bytes (debugging; gawk)

-W traditional
    Use original awk behavior (gawk)

DESCRIPTION

awk is a versatile programming language and command-line tool for processing and analyzing text files. It scans input line-by-line, splits lines into fields using a delimiter (default: whitespace), and executes actions based on patterns. Key variables include $0 (entire line), $1$n (fields), NF (field count), NR (record number).

Awk programs follow the structure pattern { action }: patterns select lines (e.g., /regex/, relational expressions); actions (in {}) perform operations like printing, arithmetic, or loops. Special blocks BEGIN {} run before input, END {} after. It supports arrays, functions, conditionals (if/else), loops (for/while), and math functions.

Ideal for data extraction, reporting, reformatting, and simple computations (e.g., summing columns, filtering logs). More expressive than grep or sed, it's Turing-complete yet concise for one-liners. On Linux, gawk (GNU implementation) is standard, adding extensions like true arrays and networking.

Example: awk '{print $1, $NF}' file.txt prints first/last fields per line. Use -F',' for CSV.

CAVEATS

Implementations vary: gawk (GNU, feature-rich), mawk (fast), original awk (minimal). GNU extensions (e.g., gensub()) reduce portability. Fields with leading/trailing whitespace may behave unexpectedly. Large inputs can consume memory due to slurp mode in some cases.

PROGRAM BLOCKS

BEGIN {}: Initialize before input.
{ action }: Every line (no pattern).
pattern {}: Matching lines.
END {}: Finalize after input.
Example: awk 'BEGIN{sum=0} {sum+=$1} END{print sum}'

FIELD ACCESS

$0: Full line.
$1: First field.
NF: Fields count.
NR: Record (line) number.
Use $(n) for nth field.

HISTORY

Created 1977 at Bell Labs by Alfred Aho, Peter Weinberger, Brian Kernighan (hence 'awk'). Evolved from sed-like tools; Version 7 Unix (1978). POSIX standardized 1985 (awk-1985), revised 1992 (awk-1992). GNU Awk (gawk) by Richard Stallman, 1986; now Linux default with extensions.

SEE ALSO

sed(1), grep(1), cut(1), perl(1), mawk(1)

Copied to clipboard