gawk
Process and manipulate text-based data
SYNOPSIS
gawk [ options ] -f program-file [ file ... ]
gawk [ options ] [ program ] [ file ... ]
PARAMETERS
-F fs
Define the input field separator to be the regular expression fs.
-v var=val
Assign the value val to the variable var before execution begins.
-f program-file
Read the awk program source from the file program-file.
program
awk program source code. This is typically a string of pattern-action pairs.
file
Input file(s) to be processed. If no files are specified, gawk reads from standard input.
-b
Treat all characters as single-byte characters. This is useful when dealing with locale issues.
-c
Enable POSIX compatibility mode, enforcing stricter adherence to the POSIX standard.
-d
Enable debugging mode.
-e program-text
Specifies program-text as the program source code. Like providing the program directly but allowing multiple programs in a command.
-l library
Loads the awk extension library.
-n
Non-decimal floating point is used
-o[file]
Pretty print the internal representation of the program. If the optional file is provided, the output is redirected to that file.
DESCRIPTION
gawk is the GNU implementation of the awk programming language. It's a versatile command-line utility used for text processing, especially for manipulating data files formatted as records and fields. gawk scans input files (or standard input) line by line, searching for lines that match specified patterns. When a matching line is found, gawk executes associated actions, which can include printing, manipulating data, performing calculations, or controlling program flow. It's commonly used for data extraction, report generation, and data validation.
gawk is based in the concept of pattern-action pairs. The pattern is a condition to match the input record and the action is a set of commands that will be executed if the pattern match.
It supports regular expressions, arithmetic operations, string manipulation, and user-defined functions. gawk can be used in scripts or directly from the command line, making it a valuable tool for system administrators, developers, and data analysts.
CAVEATS
gawk performance can be affected by the size of the input files and the complexity of the awk program. Be mindful of resource usage when processing very large datasets.
<B>VARIABLES</B>
gawk automatically sets certain variables like NF (number of fields in the current record), NR (number of the current record), FS (field separator), RS (record separator), and OFS (output field separator). These variables can be accessed and modified within the awk program.
<B>PATTERNS</B>
gawk patterns can be regular expressions enclosed in forward slashes (/pattern/), relational expressions (e.g., $1 > 10), or combinations of patterns using logical operators (&&, ||, !). Patterns determine which lines will trigger the associated actions.
<B>ACTIONS</B>
gawk actions are enclosed in curly braces ({ }). Actions can include printing data (print), performing calculations, assigning values to variables, controlling program flow (if, for, while), and calling built-in or user-defined functions.
HISTORY
awk was created at Bell Labs in the 1970s by Alfred Aho, Peter Weinberger, and Brian Kernighan. gawk is the GNU project's implementation and is widely used due to its availability on most Linux distributions. The name awk is derived from the initials of its creators. It became a powerful tool for manipulating data inside files. Since it's creation multiple implementations of awk were made. gawk is backward compatible with the original awk, adding multiple improvements and enhancements.