split

Divide a file into multiple smaller files

TLDR

Split a file, each split having 10 lines (except the last split)

$ split [[-l|--lines]] 10 [path/to/file]

Split a file into 5 files. File is split such that each split has same size (except the last split)

$ split [[-n|--number]] 5 [path/to/file]

Split a file with 512 bytes in each split (except the last split; use 512k for kilobytes and 512m for megabytes)

$ split [[-b|--bytes]] 512 [path/to/file]

Split a file with at most 512 bytes in each split without breaking lines

$ split [[-C|--line-bytes]] 512 [path/to/file]

Split into multiple files from stdin

$ gzip [[-cd|--stdout --decompress]] [path/to/compressed_file.gz] | split [[-l|--lines]] [1000] - [path/to/output]

SYNOPSIS

split [OPTION]... [FILE [PREFIX]]

FILE: The input file to be split. If omitted or '-', reads from standard input.
PREFIX: The prefix for output file names. Default is 'x'.

-a, --numeric-suffixes[=FROM]
    Generate numeric suffixes (e.g., 00, 01) instead of alphabetic ones. FROM specifies the starting number (default 0).

-b, --bytes=SIZE
    Split the input file into chunks, each SIZE bytes long. SIZE can be an integer with an optional unit (e.g., 100M for 100 megabytes, 2G for 2 gigabytes).

-C, --line-bytes=SIZE
    Similar to -b, but ensures that output files contain complete lines and at most SIZE bytes. Lines longer than SIZE will still be output but may cause the file to exceed SIZE.

-d, --numeric-suffixes
    Use numeric suffixes (00, 01, ...) instead of alphabetic ones. This is a shorthand for -a without specifying a starting number.

-e, --elide-empty-files
    Do not generate empty output files when using the -n option.

-l, --lines=NUMBER
    Split the input file into chunks, each NUMBER lines long.

-n, --number=CHUNKS
    Divide the input file into CHUNKS number of parts. This option has various sub-options like l/NUMBER (NUMBER lines per file), r/NUMBER (NUMBER files, approximate size), or x/NUMBER (NUMBER files, exact size).

-p, --preserve-leading-zeros
    When using numeric suffixes, preserve leading zeros (e.g., x001 instead of x1) for consistent file naming.

-t, --separator=SEPARATOR
    Use SEPARATOR as the line delimiter instead of the default newline character.

--verbose
    Print a diagnostic message for every output file created, showing its name and size.

DESCRIPTION

split is a command-line utility used to break a single file into multiple smaller files. It's highly flexible, allowing users to specify the size of the output files by number of lines (using -l), number of bytes (using -b), or by ensuring a maximum number of bytes per line (using -C). Alternatively, it can divide a file into a specified number of equal-sized chunks (using -n).

By default, split creates output files with a prefix of 'x' followed by two-character alphabetic suffixes (e.g., xaa, xab, xac). Users can customize the prefix and choose to use numeric suffixes (with -d or -a). This command is particularly useful for handling very large files, making them easier to transfer, process, or store on systems with file size limitations. It also facilitates parallel processing of data by breaking it into smaller, more manageable parts.

CAVEATS

Be cautious when splitting very large files into extremely small pieces, as it can generate a massive number of output files, potentially exhausting inode limits or cluttering directories.

The -n option with 'r' or 'x' modes provides approximate or exact splits respectively, but actual file sizes may vary slightly depending on line lengths and the chosen unit. When using -C, lines will not be truncated, but the file size limit might be exceeded if a single line is larger than the specified SIZE.

OUTPUT FILENAME PATTERN

split generates output files by appending suffixes to the specified PREFIX. By default, it uses two-character alphabetic suffixes (aa, ab, ac...). The number of suffix characters increases automatically if more than 26*26 (676) files are needed. Numeric suffixes can be chosen with -d or -a.

REASSEMBLING FILES

Files split with split can typically be reassembled using the cat command. For example, if you split 'original_file' into 'xaa', 'xab', 'xac', etc., you can reassemble them using: cat x* > original_file_restored. Ensure the files are concatenated in the correct order, which is naturally handled by the default naming scheme.

HISTORY

The split command is a fundamental utility that has been part of Unix-like operating systems since their early versions. It is included in the GNU Core Utilities (coreutils) package, which provides essential tools for file, text, and shell manipulation on Linux systems. Its core functionality has remained consistent over decades, proving its enduring utility for managing large data files.