sponge

Soak standard input before writing output

TLDR

Append file content to the source file

$ cat [path/to/file] | sponge -a [path/to/file]

Remove all lines starting with # in a file

$ grep [[-v|--invert-match]] '^[#]' [path/to/file] | sponge [path/to/file]

-a, --append
    Append to the given file(s) instead of overwriting.

-D, --directory
    Create missing parent directories for the output file.

-s, --dry-run
    Do not write anything; just show what would be done.

-N, --no-create
    Do not create the output file if it does not already exist. An error occurs if the file is missing.

-p, --preserve-permissions
    Preserve the permissions of the original file, if it exists.

-q, --quiet
    Suppress most error messages.

-u, --unbuffered
    Open the output file immediately but still read all input before closing. Usually defeats the primary purpose of `sponge` for in-place modifications, but can be useful when writing to pipes to ensure immediate data flow.

--help
    Display a help message and exit.

--version
    Output version information and exit.

DESCRIPTION

`sponge` is a command-line utility from the `moreutils` package designed to act as a buffer for standard input. Unlike `tee`, which writes data to files immediately as it receives it, `sponge` reads and accumulates all its input from standard input before it begins writing any data to the specified output file(s). This unique buffering behavior is crucial for safely performing in-place modifications of files using pipelines. For instance, if you try to `grep` or `sed` a file and redirect its output back to the same file (`grep pattern file > file`), the shell might truncate the file before `grep` has finished reading it, leading to data loss. `sponge` solves this by ensuring the output file is only opened and written to after the entire input stream has been consumed, preventing accidental self-truncation. It acts as a safety net, making pipelines robust. If no output file is specified, `sponge` writes to standard output, effectively acting as an intelligent buffer.

CAVEATS

`sponge` buffers all its input in memory. For very large input streams (gigabytes or more), this can lead to significant memory consumption and potential performance issues or system instability due to out-of-memory errors.
It is not suitable for processing unbounded or real-time data streams, as it must wait for the end of the input (EOF) before writing.
While it prevents self-truncation, it does not provide atomic updates across multiple processes concurrently accessing the same file.

COMMON USE CASE: IN-PLACE FILE MODIFICATION

The primary use of `sponge` is to safely modify a file in-place using standard pipelines. For example, to remove all lines containing 'obsolete' from `my_file.txt`:
`grep -v obsolete my_file.txt | sponge my_file.txt`
Without `sponge`, `grep -v obsolete my_file.txt > my_file.txt` would likely truncate `my_file.txt` before `grep` finishes reading it, leading to data loss.

BUFFERING MECHANISM

`sponge`'s core functionality relies on reading its entire input into an internal memory buffer. Only after the End-Of-File (EOF) marker is received, signifying that all input has been read, does `sponge` then open the specified output file(s) and write the buffered content. This 'read-all-then-write' mechanism is what prevents race conditions and truncation when modifying files in a pipeline where the source and destination are the same.

HISTORY

`sponge` is part of the `moreutils` collection of command-line utilities, which are designed to fill common gaps in standard Unix toolsets. Developed by Joey Hess, `moreutils` aims to provide simple, robust, and single-purpose tools. `sponge` was specifically created to solve the long-standing problem of safely modifying a file in-place when using standard input/output redirection in shell pipelines, a problem that `sed -i` or temporary files usually address but `sponge` provides a more general pipeline-friendly solution.