LinuxCommandLibrary

csplit

Split a file into sections

TLDR

Split a file at lines 5 and 23

$ csplit [path/to/file] 5 23
copy

Split a file every 5 lines (this will fail if the total number of lines is not divisible by 5)
$ csplit [path/to/file] 5 {*}
copy

Split a file every 5 lines, ignoring exact-division error
$ csplit [[-k|--keep-files]] [path/to/file] 5 {*}
copy

Split a file at line 5 and use a custom prefix for the output files
$ csplit [path/to/file] 5 [[-f|--prefix]] [prefix]
copy

Split a file at a line matching a regular expression
$ csplit [path/to/file] /[regular_expression]/
copy

SYNOPSIS

csplit [OPTION]... FILE PATTERN...

PARAMETERS

-f, --prefix=PREFIX
    Use PREFIX instead of `xx'.

-b, --suffix-format=FORMAT
    Use sprintf FORMAT instead of `%02d'.

-n, --digits=DIGITS
    Use specified number of digits instead of 2.

-s, --quiet, --silent
    Do not print counts of output file sizes.

-k, --keep-files
    Do not remove output files on errors.

-z, --elide-empty-files
    Suppress generation of zero-length output files.

--help
    Display help and exit.

--version
    Output version information and exit.

PATTERN
    The pattern to split the file, can be a line number, regular expression or other specifier

DESCRIPTION

The csplit command in Linux splits a file into sections determined by context lines. It reads the input file, separates it into multiple output files based on given patterns (lines numbers or regular expressions), and names the output files sequentially. csplit is useful for dividing large files into smaller, more manageable chunks for easier processing or analysis. It is especially helpful when dealing with log files or other text-based data where specific delimiters mark the boundaries between sections. Unlike `split`, which divides based on size, csplit uses content for division.

The sections can be defined using line numbers, regular expressions, or a combination of both. The standard output will show the byte size of each splitted file.

CAVEATS

If an error occurs or a HUP, INT, or TERM signal is received, csplit removes the output files it has created, unless the `-k` option is specified.

PATTERN DESCRIPTION

PATTERN can be:
INTEGER: Copy to the next line number but not including it.
/REGEXP/: Copy to but not including a matching line.
%REGEXP%: Skip to but not including a matching line.
{INTEGER}: Repeat the previous pattern specified number of times.

SEE ALSO

split(1), sed(1), awk(1)

Copied to clipboard