split
Divide a file into multiple smaller files
TLDR
Split a file, each split having 10 lines (except the last split)
Split a file into 5 files. File is split such that each split has same size (except the last split)
Split a file with 512 bytes in each split (except the last split; use 512k for kilobytes and 512m for megabytes)
Split a file with at most 512 bytes in each split without breaking lines
SYNOPSIS
split [OPTION]... [INPUT [PREFIX]]
PARAMETERS
-a, --suffix-length=N
Use suffixes of length N (default 2).
-b, --bytes=SIZE
Put SIZE bytes per output file.
-C, --line-bytes=SIZE
Put at most SIZE bytes of lines per output file.
-d, --numeric-suffixes
Use numeric suffixes instead of alphabetic.
-e, --elide-empty-files
Do not generate zero-length output files.
-l, --lines=NUMBER
Put NUMBER lines per output file.
--verbose
Print a diagnostic message to standard error just before each output file is opened.
--help
Display help information and exit.
--version
Output version information and exit.
INPUT
Input file to be splitted
PREFIX
Prefix name for splitted files
DESCRIPTION
The `split` command in Linux is used to divide a large file into smaller, more manageable pieces. This is particularly useful for transferring large files over networks, backing them up, or processing them in environments with limited memory or file size constraints. The command offers several options to customize how the file is split, including specifying the size of each output file (in bytes, kilobytes, megabytes, etc.) or the number of lines per file. By default, `split` creates files named with a prefix (usually 'x') followed by a two-character suffix (aa, ab, ac, ...), but this prefix can be changed using command-line arguments. It's a fundamental utility for file manipulation and is commonly used in scripting and automation workflows. The split files can be rejoined to create a file identical to the original.
CAVEATS
If no input file is specified, `split` reads from standard input. Large values for byte counts or line numbers might result in memory issues depending on system resources.
JOINING SPLITTED FILES
To rejoin files that have been split, the `cat` command is commonly used. For example, if you split a file named 'bigfile.txt' into files prefixed with 'part-', you can rejoin them using: `cat part-* > bigfile_recombined.txt`. Always verify the MD5 sum of the original file and the recombined file to confirm data integrity.
DEFAULT BEHAVIOR
By default, split creates files with the prefix 'x' followed by a two-character suffix ('aa', 'ab', 'ac', etc.). If you split a large file enough times to exceed the possible combinations (e.g., xzz), the command will throw an error. In cases where you know this will happen consider using -a to specify suffix length. You can use the -d option for numeric suffixes.
USE CASES
Common uses include splitting log files for easier analysis, breaking up large database dumps for easier transportation, and preparing files for systems with file size limitations.
HISTORY
The `split` command has been a standard utility in Unix-like operating systems for many years, predating the GNU coreutils implementation. Its purpose has remained consistent: to provide a simple and effective way to divide large files. Over time, the command has seen minor enhancements and standardizations across different Unix flavors and Linux distributions, primarily focused on ensuring consistent behavior and adding options for greater control over the splitting process.