LinuxCommandLibrary

split

Divide a file into multiple smaller files

TLDR

Split a file, each split having 10 lines (except the last split)

$ split [[-l|--lines]] 10 [path/to/file]
copy

Split a file into 5 files. File is split such that each split has same size (except the last split)
$ split [[-n|--number]] 5 [path/to/file]
copy

Split a file with 512 bytes in each split (except the last split; use 512k for kilobytes and 512m for megabytes)
$ split [[-b|--bytes]] 512 [path/to/file]
copy

Split a file with at most 512 bytes in each split without breaking lines
$ split [[-C|--line-bytes]] 512 [path/to/file]
copy

SYNOPSIS

split [OPTION]... [INPUT [PREFIX]]

PARAMETERS

-a, --suffix-length=N
    Use suffixes of length N (default 2).

-b, --bytes=SIZE
    Put SIZE bytes per output file.

-C, --line-bytes=SIZE
    Put at most SIZE bytes of lines per output file.

-d, --numeric-suffixes
    Use numeric suffixes instead of alphabetic.

-e, --elide-empty-files
    Do not generate zero-length output files.

-l, --lines=NUMBER
    Put NUMBER lines per output file.

--verbose
    Print a diagnostic message to standard error just before each output file is opened.

--help
    Display help information and exit.

--version
    Output version information and exit.

INPUT
    Input file to be splitted

PREFIX
    Prefix name for splitted files

DESCRIPTION

The `split` command in Linux is used to divide a large file into smaller, more manageable pieces. This is particularly useful for transferring large files over networks, backing them up, or processing them in environments with limited memory or file size constraints. The command offers several options to customize how the file is split, including specifying the size of each output file (in bytes, kilobytes, megabytes, etc.) or the number of lines per file. By default, `split` creates files named with a prefix (usually 'x') followed by a two-character suffix (aa, ab, ac, ...), but this prefix can be changed using command-line arguments. It's a fundamental utility for file manipulation and is commonly used in scripting and automation workflows. The split files can be rejoined to create a file identical to the original.

CAVEATS

If no input file is specified, `split` reads from standard input. Large values for byte counts or line numbers might result in memory issues depending on system resources.

JOINING SPLITTED FILES

To rejoin files that have been split, the `cat` command is commonly used. For example, if you split a file named 'bigfile.txt' into files prefixed with 'part-', you can rejoin them using: `cat part-* > bigfile_recombined.txt`. Always verify the MD5 sum of the original file and the recombined file to confirm data integrity.

DEFAULT BEHAVIOR

By default, split creates files with the prefix 'x' followed by a two-character suffix ('aa', 'ab', 'ac', etc.). If you split a large file enough times to exceed the possible combinations (e.g., xzz), the command will throw an error. In cases where you know this will happen consider using -a to specify suffix length. You can use the -d option for numeric suffixes.

USE CASES

Common uses include splitting log files for easier analysis, breaking up large database dumps for easier transportation, and preparing files for systems with file size limitations.

HISTORY

The `split` command has been a standard utility in Unix-like operating systems for many years, predating the GNU coreutils implementation. Its purpose has remained consistent: to provide a simple and effective way to divide large files. Over time, the command has seen minor enhancements and standardizations across different Unix flavors and Linux distributions, primarily focused on ensuring consistent behavior and adding options for greater control over the splitting process.

SEE ALSO

cat(1), csplit(1), head(1), tail(1)

Copied to clipboard