cut
Extract sections from lines of text
TLDR
Print a specific [c]haracter/[f]ield range of each line
Print a field range of each line with a specific delimiter
Print a character range of each line of the specific file
Print specific fields of NUL terminated lines (e.g. as in find . -print0) instead of newlines
SYNOPSIS
cut OPTION... [FILE]...
PARAMETERS
-b, --bytes=LIST
Select only these bytes.
-c, --characters=LIST
Select only these characters.
-d, --delimiter=DELIM
Use DELIM instead of TAB for field delimiter.
-f, --fields=LIST
Select only these fields; also print any line that contains no delimiter character, unless the -s option is specified.
-n
With `-b`: don't split multibyte characters.
-s, --only-delimited
Do not print lines not containing delimiters.
--output-delimiter=STRING
Use STRING as the output delimiter. The default is to use the input delimiter.
--help
Display a help message and exit.
--version
Output version information and exit.
DESCRIPTION
The `cut` command in Linux is a utility for extracting specific sections from lines of text in a file or from standard input. It allows you to select portions of lines based on delimiters or character positions. Common use cases include extracting specific columns from delimited data (like CSV files), extracting characters from fixed-width data, or manipulating text strings. `cut` is a fundamental text processing tool, often used in conjunction with other commands via pipes to perform more complex data transformations. It's designed for relatively simple extraction tasks; for more complex pattern matching or data manipulation, tools like `awk` or `sed` are often preferred. However, `cut`'s simplicity and efficiency make it a valuable tool in many scripting and command-line workflows.
It is very easy to use with shell scripts, because the syntax is designed to be short and easy to read.
CAVEATS
The `-n` option is only relevant when using `-b` and dealing with multibyte character encodings like UTF-8. Without `-n`, `cut` might split multibyte characters, resulting in invalid output.
When using `-f` without specifying `-d`, the default delimiter is the TAB character. Be mindful of whitespace in your input files when relying on the default delimiter.
LIST SYNTAX
The LIST parameter used with `-b`, `-c`, and `-f` specifies the range of bytes, characters, or fields to extract. It can be a single number, a range (e.g., `1-5`), a list of numbers (e.g., `1,3,5`), or a combination of ranges and numbers (e.g., `1-3,5,8-10`). A missing start number in a range means 'the first,' and a missing end number means 'the last.' For example, `-c -5` means characters 1 through 5, and `-c 5-` means character 5 through the end of the line.
EXIT STATUS
The `cut` command returns an exit status of 0 on success and a non-zero value on error. Errors can occur due to invalid options, incorrect file permissions, or if the specified file does not exist.
HISTORY
The `cut` command has been a standard utility in Unix-like operating systems for a long time, originating in early versions of Unix. It was designed as a basic text processing tool to extract columns or sections of lines from files. Over time, the command has been standardized by POSIX, ensuring consistent behavior across different Unix-like systems. The core functionality of `cut` has remained largely unchanged, focusing on its initial purpose of simple, efficient extraction.