LinuxCommandLibrary

csplit

Split a file into sections

TLDR

Split a file in two parts, starting the second one at line 10

$ csplit [path/to/file] 10
copy

Split a file in three parts, starting the latter parts in lines 7 and 23
$ csplit [path/to/file] 7 23
copy

Start a new part at every 5th line (will fail if number of lines is not divisible by 5)
$ csplit [path/to/file] 5 {*}
copy

Start a new part at every 5th line, ignoring exact-division error
$ csplit [[-k|--keep-files]] [path/to/file] 5 {*}
copy

Split a file above line 5 and use a custom prefix for the output files (default is xx)
$ csplit [path/to/file] 5 [[-f|--prefix]] [prefix]
copy

Split a file above the first line matching a regex pattern
$ csplit [path/to/file] /[regex]/
copy

SYNOPSIS

csplit [OPTION]... FILE PATTERN...

PARAMETERS

-b, --suffix-format=FORMAT
    use sprintf FORMAT (e.g., %02d) for suffixes instead of %02d

-f, --prefix=PREFIX
    use PREFIX (default xx) for output filenames

-k, --keep-files
    keep all generated files on error

-n, --digits=DIGITS
    use at most DIGITS digits in filenames (default 2)

-s, --quiet, --silent
    suppress printing file sizes

-z, --elide-empty-files
    remove empty output files from consideration

--help
    display help and exit

--version
    output version information and exit

DESCRIPTION

csplit is a powerful Unix/Linux utility for splitting a file into multiple pieces based on contextual delimiters, such as regular expressions or line numbers, rather than fixed sizes. Unlike split, which divides files into chunks of equal length, csplit identifies split points dynamically.

For example, to split a C source file at each function definition matching regex ^func, use: csplit file.c '/^func/' '{*}'. This creates files like xx00 (before first match), xx01 (first function), up to xxNN, plus a final xxNN+1 for remainder.

Patterns include:
- /REGEXP/ or %REGEXP%: split at lines matching regex (absolute or repeated).
- LINE_NO: split after specific line number.
- {N}: repeat previous pattern N times.

By default, it prints byte counts of output files. Output filenames use xx00 prefix with 2-digit suffixes. Ideal for logs, scripts, or structured text. Handles large files efficiently but creates many small files.

CAVEATS

Creates files sequentially (xx00, xx01,...); may overwrite existing files with same names. Large repeat counts ({*}) can generate excessive files. No built-in regex options like case-insensitivity.

PATTERN SYNTAX

/REGEXP/OFFSET: split at regex match with offset lines before/after.
%REGEXP/OFFSET%: repeat regex globally.
NUM/OFFSET: split after NUM lines.
{N} or {*}: repeat prior pattern N times or until EOF.

EXAMPLE

csplit logfile '/^ERROR:/' '{20}' '%^Date:%'
Splits at first 20 ERROR lines, then repeats Date: patterns.

HISTORY

Part of POSIX.1-2008; GNU version in coreutils since 1980s, evolved for better regex support and options like --keep-files.

SEE ALSO

split(1), cut(1), awk(1), sed(1)

Copied to clipboard