mawk

Process text files, line by line

SYNOPSIS

mawk [options] -- program [file ...]
mawk [options] -f program_file [file ...]

-F fs
    Specifies the field separator fs. The default is whitespace.

-f program_file
    Reads the AWK program from the specified program_file instead of from the command line.

-v var=value
    Assigns a value to a program variable var before the AWK program begins execution.

-W option
    Used to enable specific warning options, such as posix (for POSIX compliance warnings) or compat (for compatibility warnings).

--
    Indicates the end of command-line options. Any subsequent arguments are treated as files to be processed by the AWK program, even if they begin with a hyphen.

DESCRIPTION

mawk (Mike Brennan's AWK) is a highly efficient and compact implementation of the AWK programming language, widely used on Unix-like operating systems. It functions as a powerful stream editor and data manipulation tool, capable of performing pattern-directed scanning and processing. Unlike some other AWK variants like gawk (GNU Awk), mawk prioritizes speed and a small memory footprint while strictly adhering to the POSIX Awk standard. This makes it an excellent choice for scripts requiring high performance or operating in resource-constrained environments.

Users typically employ mawk for tasks such as extracting specific data from text files, reformatting output, generating reports, and performing complex text transformations. It processes input line by line, comparing each line against a set of user-defined patterns and executing associated actions when a pattern matches. Its straightforward syntax and powerful regular expression capabilities make it an indispensable utility for system administrators, developers, and anyone involved in data analysis.

CAVEATS

mawk strictly adheres to the POSIX Awk standard, which means it may not support some of the GNU Awk (gawk) extensions, such as certain built-in functions (e.g., gensub, strtonum), advanced regular expression features (e.g., ~~ for alternation), or gawk-specific variables. Scripts written for gawk that utilize these extensions might not run correctly or at all under mawk. For maximum portability and compatibility, mawk is often preferred, but for feature-rich AWK development, gawk is typically used.

MAWK VS. GAWK

While both are AWK implementations, mawk prioritizes speed and POSIX compliance, making it lightweight and fast. gawk, on the other hand, is the GNU Project's AWK and includes many extensions and non-standard features, often preferred for more complex scripting due to its richer functionality.

BASIC PROGRAM STRUCTURE

An AWK program in mawk consists of pattern { action } pairs. It can also include BEGIN { action } blocks, executed before processing any input, and END { action } blocks, executed after all input has been processed. This structure allows for powerful pre-processing, per-line processing, and post-processing of text data.

HISTORY

mawk was developed by Mike Brennan. Its primary design goal was to provide a fast and memory-efficient AWK implementation that rigorously followed the POSIX standard. Unlike gawk, which continuously adds new features and extensions, mawk has remained focused on its core principles of performance and standard compliance. This historical emphasis makes mawk a robust and reliable choice, particularly for environments where system resources are a concern or where strict adherence to POSIX behavior is paramount. It is often the default awk binary on many Linux distributions due to its minimal footprint and speed.