sort
Sort lines of text files
TLDR
Sort a file in ascending order
Sort a file in descending order
Sort a file in case-insensitive way
Sort a file using numeric rather than alphabetic order
Sort /etc/passwd by the 3rd field of each line numerically, using ":" as a field separator
As above, but when items in the 3rd field are equal, sort by the 4th field by numbers with exponents
Sort a file preserving only unique lines
Sort a file, printing the output to the specified output file (can be used to sort a file in-place)
SYNOPSIS
sort [OPTION]... [FILE]...
sort [OPTION]... --files-from=F
PARAMETERS
-r, --reverse
Reverse the result of comparisons, sorting in descending order.
-n, --numeric-sort
Compare according to string numerical value, useful for numbers.
-k POS1[,POS2], --key=POS1[,POS2]
Sort via a key; the key is the part of the line from POS1 to POS2 (inclusive). POS can specify field number and character position within field.
-u, --unique
With -c, check for strict ordering; without -c, output only the first of a run of multiple identical lines, effectively removing duplicates.
-o FILE, --output=FILE
Write the sorted result to FILE instead of standard output.
-f, --ignore-case, --fold-case
Fold lower case characters to upper case characters when sorting, making comparisons case-insensitive.
-b, --ignore-leading-blanks
Ignore leading blanks when determining the first non-blank character of a field for comparison.
-t SEP, --field-separator=SEP
Use SEP instead of non-blank to blank transition as the field separator for -k option.
-V, --version-sort
Sort text that contains version numbers, e.g., 'v1.0', 'v2.1', 'v10.0'.
-c, --check, --check=diagnose-first
Check if input is already sorted; do not sort. Exit status is 0 if sorted, 1 otherwise.
-m, --merge
Merge already sorted files; do not sort the input, only combine them into one sorted output.
--help
Display a help message and exit.
--version
Output version information and exit.
DESCRIPTION
The sort command is a fundamental Unix/Linux utility used for rearranging the lines of text files in a defined order. It processes input from specified files or standard input and writes the sorted result to standard output or a designated file. By default, sort performs a lexicographical sort based on the entire line, comparing characters byte by byte. Its true power, however, lies in its extensive range of options, enabling users to specify complex sorting criteria.
Users can easily perform numeric sorting, reverse the order, ignore case, skip leading blanks, or sort based on specific fields (keys) within a line using custom delimiters. It's an indispensable tool for data preprocessing, log analysis, and preparing data for subsequent command-line operations. sort can also efficiently check if a file is already sorted or merge multiple pre-sorted files, making it a versatile cornerstone of shell scripting and command-line data manipulation.
CAVEATS
- Memory Usage: For very large files, sort may require significant temporary disk space if the data volume exceeds available RAM, potentially impacting performance.
- Locale Dependency: The sorting order can be heavily influenced by the current locale settings (e.g., LC_ALL, LC_COLLATE). Different locales might yield different sorted outputs, especially for non-ASCII characters. For byte-by-byte sorting, consider setting LC_ALL=C.
- Key Definition Complexity: Defining complex sort keys using -k can be intricate, especially when dealing with character positions, field numbers, and various modifiers within a single key definition.
LOCALE AND COLLATION
The behavior of sort regarding character ordering is governed by the LC_COLLATE category of the current locale. If you need a consistent byte-by-byte sort (ASCII order), it is highly recommended to set the LC_ALL or LC_COLLATE environment variable to C or POSIX (e.g., LC_ALL=C sort file.txt). This ensures predictable results across different systems and locales.
STABLE SORT
sort by default performs a stable sort. This means that if two lines compare as equal according to the specified sort keys, their original relative order is preserved. This feature is crucial when performing multiple sorting passes on data where the order of equal elements matters.
EFFICIENCY FOR LARGE FILES
GNU sort is highly optimized for performance and can efficiently handle extremely large files. It often uses external merge sort algorithms when the entire dataset does not fit into memory. This involves breaking the data into smaller chunks, sorting each chunk, writing them to temporary files, and then merging these sorted chunks, making it suitable for big data tasks on a single machine.
HISTORY
The sort command is one of the original utilities found in the AT&T Unix operating system, dating back to its earliest versions in the 1970s. It has been a core component of the Unix philosophy of small, focused tools that can be combined to perform complex tasks. Over decades, it has been reimplemented and enhanced in various Unix-like systems, including GNU Coreutils for Linux, adding more robust features like stable sorting, version sorting, and improved performance for very large datasets. Its consistent presence and powerful capabilities underscore its enduring utility in data processing.