LinuxCommandLibrary

gawk

Process and manipulate text-based data

TLDR

Print the fifth column (a.k.a. field) in a space-separated file

$ gawk '{print $5}' [path/to/file]
copy

Print the second column of the lines containing "foo" in a space-separated file
$ gawk '/[foo]/ {print $2}' [path/to/file]
copy

Print the last column of each line in a file, using a comma (instead of space) as a field separator
$ gawk [[-F|--field-separator]] ',' '{print $NF}' [path/to/file]
copy

Sum the values in the first column of a file and print the total
$ gawk '{s+=$1} END {print s}' [path/to/file]
copy

Print every third line starting from the first line
$ gawk 'NR%3==1' [path/to/file]
copy

Print different values based on conditions
$ gawk '{if ($1 == "foo") print "Exact match foo"; else if ($1 ~ "bar") print "Partial match bar"; else print "Baz"}' [path/to/file]
copy

Print all the lines which the 10th column value is between a min and a max
$ gawk '($10 >= [min_value] && $10 <= [max_value])'
copy

Print table of users with UID >=1000 with header and formatted output, using colon as separator (%-20s mean: 20 left-align string characters, %6s means: 6 right-align string characters)
$ gawk 'BEGIN {FS=":";printf "%-20s %6s %25s\n", "Name", "UID", "Shell"} $4 >= 1000 {printf "%-20s %6d %25s\n", $1, $4, $7}' /etc/passwd
copy

SYNOPSIS

gawk [options] [-f program-file | --source program-text] [file ...]
gawk [options] program-text [file ...]

PARAMETERS

-F fs
    Sets the field separator for input records to fs. This can be a single character or a regular expression.

-v var=value
    Assigns a value to a variable var before program execution begins. Useful for passing external data into the gawk script.

-f program-file
    Reads the gawk program source from the specified program-file, allowing for larger, more complex scripts.

-W lint, --lint
    Issues warnings about constructs that are not portable to other awk implementations or might indicate a mistake in the script logic.

-W posix, --posix
    Disables GNU extensions and uses POSIX awk features exclusively, ensuring strict compatibility with the POSIX standard.

-W traditional, --traditional
    Behaves like original awk, disabling most GNU extensions for backward compatibility.

-E encoding, --encoding=encoding
    Specifies the character encoding for input and output, supporting various character sets.

--help
    Prints a summary of the gawk command-line options and usage information.

--version
    Prints gawk's version number and configuration information to standard output.

DESCRIPTION

gawk (GNU Awk) is a powerful pattern scanning and processing language. It reads input files line by line, attempting to match each line against a set of user-defined patterns. For each line that matches a pattern, gawk executes a corresponding action. This makes it exceptionally useful for data extraction, text transformation, and report generation from structured or semi-structured text files like logs, CSVs, or configuration files.

At its core, gawk operates on "fields" within a line, which are by default separated by whitespace. It provides built-in variables for the current record ($0), the number of fields (NF), the current record number (NR), and individual fields ($1, $2, etc.). It supports regular expressions for sophisticated pattern matching, offers conditional statements, loops, associative arrays, and user-defined functions, making it a complete programming language for text manipulation. gawk is the GNU Project's implementation of the awk programming language, designed to be compatible with the original awk while adding many GNU-specific extensions and features.

CAVEATS

While powerful, gawk can become memory-intensive for extremely large files or when using large associative arrays, potentially impacting performance. Its expressive syntax, while concise, can sometimes lead to less readable scripts for complex logic, especially for those new to awk. Locale settings can significantly affect string and regular expression behavior; using --posix often helps ensure consistent behavior across different environments.

PATTERNS AND ACTIONS

An awk program consists of a series of rules, each defined as pattern { action }. gawk reads each input line, and if the line matches the specified pattern, the action block is executed. If no pattern is given, the action is performed for every line. If no action is given, the line matching the pattern is simply printed. Patterns can be regular expressions, arithmetic relational expressions, or combinations thereof. Actions can include variables, loops, conditionals, and built-in or user-defined functions.

BEGIN AND END BLOCKS

gawk provides special patterns: BEGIN and END. The BEGIN block's action is executed once before any input lines are read and processed. It's typically used for initializing variables or printing headers. The END block's action is executed once after all input lines have been read. It's often used for final calculations, summaries, or printing footers. These blocks do not operate on input lines directly.

BUILT-IN VARIABLES

gawk provides several useful built-in variables that provide information about the input or control program behavior. Key ones include:
NR (Number of Record): The current input record number.
NF (Number of Fields): The total number of fields in the current input record.
FS (Field Separator): The input field separator (default is whitespace).
OFS (Output Field Separator): The output field separator (default is a space).
RS (Record Separator): The input record separator (default is newline).
ORS (Output Record Separator): The output record separator (default is newline).
$0: The entire current input record.
$1, $2, ...: Individual fields of the current input record.

HISTORY

The awk programming language was originally developed at Bell Labs in the 1970s by Alfred Aho, Peter Weinberger, and Brian Kernighan (hence A.W.K.). It was designed for text processing and report generation. gawk, the GNU Project's implementation, was written to be a free software alternative, offering compatibility with the original awk while introducing numerous extensions and enhancements. First released in the mid-1980s, gawk has since become the standard awk implementation on most Linux and Unix-like systems due to its robust feature set, adherence to standards (like POSIX), and continuous development.

SEE ALSO

awk(1), sed(1), grep(1), cut(1), sort(1)

Copied to clipboard