awk
Process and transform text-based data
TLDR
Print the fifth column (a.k.a. field) in a space-separated file
Print the second column of the lines containing "foo" in a space-separated file
Print the last column of each line in a file, using a comma (instead of space) as a field separator
Sum the values in the first column of a file and print the total
Print every third line starting from the first line
Print different values based on conditions
Print all the lines which the 10th column value is between a min and a max
Print table of users with UID >=1000 with header and formatted output, using colon as separator (%-20s mean: 20 left-align string characters, %6s means: 6 right-align string characters)
SYNOPSIS
awk [-F fs] [-v var=val] [-f progfile] [--posix] [--] 'program' [files]
PARAMETERS
-F fs
Field separator (fs); default whitespace or TAB
-f file
Read awk program from file instead of command line
-v var=val
Assign val to var before program runs (repeatable)
--
End options; treat following as filenames
-V
Print version and exit (gawk)
--help
Print help and exit (gawk)
--posix
Enforce POSIX compatibility (gawk)
-mf n
Limit function args to n (debugging; gawk)
-mr n
Limit record size to n bytes (debugging; gawk)
-W traditional
Use original awk behavior (gawk)
DESCRIPTION
awk is a versatile programming language and command-line tool for processing and analyzing text files. It scans input line-by-line, splits lines into fields using a delimiter (default: whitespace), and executes actions based on patterns. Key variables include $0 (entire line), $1–$n (fields), NF (field count), NR (record number).
Awk programs follow the structure pattern { action }: patterns select lines (e.g., /regex/, relational expressions); actions (in {}) perform operations like printing, arithmetic, or loops. Special blocks BEGIN {} run before input, END {} after. It supports arrays, functions, conditionals (if/else), loops (for/while), and math functions.
Ideal for data extraction, reporting, reformatting, and simple computations (e.g., summing columns, filtering logs). More expressive than grep or sed, it's Turing-complete yet concise for one-liners. On Linux, gawk (GNU implementation) is standard, adding extensions like true arrays and networking.
Example: awk '{print $1, $NF}' file.txt prints first/last fields per line. Use -F',' for CSV.
CAVEATS
Implementations vary: gawk (GNU, feature-rich), mawk (fast), original awk (minimal). GNU extensions (e.g., gensub()) reduce portability. Fields with leading/trailing whitespace may behave unexpectedly. Large inputs can consume memory due to slurp mode in some cases.
PROGRAM BLOCKS
BEGIN {}: Initialize before input.
{ action }: Every line (no pattern).
pattern {}: Matching lines.
END {}: Finalize after input.
Example: awk 'BEGIN{sum=0} {sum+=$1} END{print sum}'
FIELD ACCESS
$0: Full line.
$1: First field.
NF: Fields count.
NR: Record (line) number.
Use $(n) for nth field.
HISTORY
Created 1977 at Bell Labs by Alfred Aho, Peter Weinberger, Brian Kernighan (hence 'awk'). Evolved from sed-like tools; Version 7 Unix (1978). POSIX standardized 1985 (awk-1985), revised 1992 (awk-1992). GNU Awk (gawk) by Richard Stallman, 1986; now Linux default with extensions.


