LinuxCommandLibrary

sort

Sort lines of text files

TLDR

Sort a file in ascending order

$ sort [path/to/file]
copy

Sort a file in descending order
$ sort [[-r|--reverse]] [path/to/file]
copy

Sort a file in case-insensitive way
$ sort [[-f|--ignore-case]] [path/to/file]
copy

Sort a file using numeric rather than alphabetic order
$ sort [[-n|--numeric-sort]] [path/to/file]
copy

Sort /etc/passwd by the 3rd field of each line numerically, using ":" as a field separator
$ sort [[-t|--field-separator]] [:] [[-k|--key]] [3n] [/etc/passwd]
copy

As above, but when items in the 3rd field are equal, sort by the 4th field by numbers with exponents
$ sort [[-t|--field-separator]] [:] [[-k|--key]] [3,3n] [[-k|--key]] [4,4g] [/etc/passwd]
copy

Sort a file preserving only unique lines
$ sort [[-u|--unique]] [path/to/file]
copy

Sort a file, printing the output to the specified output file (can be used to sort a file in-place)
$ sort [[-o|--output]] [path/to/file] [path/to/file]
copy

SYNOPSIS


sort [OPTION]... [FILE]...
sort [OPTION]... --files-from=F

PARAMETERS

-r, --reverse
    
Reverse the result of comparisons, sorting in descending order.

-n, --numeric-sort
    
Compare according to string numerical value, useful for numbers.

-k POS1[,POS2], --key=POS1[,POS2]
    
Sort via a key; the key is the part of the line from POS1 to POS2 (inclusive). POS can specify field number and character position within field.

-u, --unique
    
With -c, check for strict ordering; without -c, output only the first of a run of multiple identical lines, effectively removing duplicates.

-o FILE, --output=FILE
    
Write the sorted result to FILE instead of standard output.

-f, --ignore-case, --fold-case
    
Fold lower case characters to upper case characters when sorting, making comparisons case-insensitive.

-b, --ignore-leading-blanks
    
Ignore leading blanks when determining the first non-blank character of a field for comparison.

-t SEP, --field-separator=SEP
    
Use SEP instead of non-blank to blank transition as the field separator for -k option.

-V, --version-sort
    
Sort text that contains version numbers, e.g., 'v1.0', 'v2.1', 'v10.0'.

-c, --check, --check=diagnose-first
    
Check if input is already sorted; do not sort. Exit status is 0 if sorted, 1 otherwise.

-m, --merge
    
Merge already sorted files; do not sort the input, only combine them into one sorted output.

--help
    
Display a help message and exit.

--version
    
Output version information and exit.

DESCRIPTION

The sort command is a fundamental Unix/Linux utility used for rearranging the lines of text files in a defined order. It processes input from specified files or standard input and writes the sorted result to standard output or a designated file. By default, sort performs a lexicographical sort based on the entire line, comparing characters byte by byte. Its true power, however, lies in its extensive range of options, enabling users to specify complex sorting criteria.

Users can easily perform numeric sorting, reverse the order, ignore case, skip leading blanks, or sort based on specific fields (keys) within a line using custom delimiters. It's an indispensable tool for data preprocessing, log analysis, and preparing data for subsequent command-line operations. sort can also efficiently check if a file is already sorted or merge multiple pre-sorted files, making it a versatile cornerstone of shell scripting and command-line data manipulation.

CAVEATS

  • Memory Usage: For very large files, sort may require significant temporary disk space if the data volume exceeds available RAM, potentially impacting performance.
  • Locale Dependency: The sorting order can be heavily influenced by the current locale settings (e.g., LC_ALL, LC_COLLATE). Different locales might yield different sorted outputs, especially for non-ASCII characters. For byte-by-byte sorting, consider setting LC_ALL=C.
  • Key Definition Complexity: Defining complex sort keys using -k can be intricate, especially when dealing with character positions, field numbers, and various modifiers within a single key definition.

LOCALE AND COLLATION

The behavior of sort regarding character ordering is governed by the LC_COLLATE category of the current locale. If you need a consistent byte-by-byte sort (ASCII order), it is highly recommended to set the LC_ALL or LC_COLLATE environment variable to C or POSIX (e.g., LC_ALL=C sort file.txt). This ensures predictable results across different systems and locales.

STABLE SORT

sort by default performs a stable sort. This means that if two lines compare as equal according to the specified sort keys, their original relative order is preserved. This feature is crucial when performing multiple sorting passes on data where the order of equal elements matters.

EFFICIENCY FOR LARGE FILES

GNU sort is highly optimized for performance and can efficiently handle extremely large files. It often uses external merge sort algorithms when the entire dataset does not fit into memory. This involves breaking the data into smaller chunks, sorting each chunk, writing them to temporary files, and then merging these sorted chunks, making it suitable for big data tasks on a single machine.

HISTORY

The sort command is one of the original utilities found in the AT&T Unix operating system, dating back to its earliest versions in the 1970s. It has been a core component of the Unix philosophy of small, focused tools that can be combined to perform complex tasks. Over decades, it has been reimplemented and enhanced in various Unix-like systems, including GNU Coreutils for Linux, adding more robust features like stable sorting, version sorting, and improved performance for very large datasets. Its consistent presence and powerful capabilities underscore its enduring utility in data processing.

SEE ALSO

uniq(1), tac(1), cut(1), comm(1), grep(1), awk(1), sed(1)

Copied to clipboard