LinuxCommandLibrary

csvclean

CSV file validator and cleaner

TLDR

Validate and report errors in a CSV file
$ csvclean [data.csv]
copy
Clean a CSV file and output corrected version
$ csvclean -n [data.csv]
copy
Validate with custom delimiter
$ csvclean -d "[;]" [data.csv]
copy
Check file with no header row
$ csvclean --no-header-row [data.csv]
copy
Validate using specific encoding
$ csvclean -e [latin1] [data.csv]
copy
Output errors to stderr while cleaning
$ csvclean [data.csv] 2> [errors.txt]
copy

SYNOPSIS

csvclean [options] file

DESCRIPTION

csvclean is part of csvkit that validates and cleans CSV files. It detects common problems like inconsistent column counts, stray quotes, and encoding issues, either reporting them or fixing them automatically.
When run without -n, it creates two output files: one with cleaned data and one with rows that had errors. This allows review of problematic rows without losing data. With -n, it only reports errors without creating files.
The tool handles various CSV dialects and can work with files using different delimiters, quote characters, and encodings. It's essential for preprocessing messy data before analysis.

PARAMETERS

FILE

CSV file to clean or validate.
-n, --dry-run
Don't create output files, just report errors.
-d CHAR, --delimiter CHAR
Field delimiter (default: comma).
-t, --tabs
Use tabs as delimiter.
-q CHAR, --quotechar CHAR
Quote character (default: double quote).
-e ENCODING, --encoding ENCODING
Input file encoding.
--no-header-row
File has no header row.
-H, --no-inference
Disable type inference.

CAVEATS

Automatic cleaning may alter data in unexpected ways; review cleaned output carefully. Large files can be slow to process. Some edge cases in CSV formatting may not be detected. Original file is not modified.

HISTORY

csvclean is part of csvkit, created by Christopher Groskopf and first released in 2011. csvkit provides a suite of tools for working with CSV files, designed to bring the power of Unix philosophy to tabular data processing.

SEE ALSO

csvstat(1), csvcut(1), csvlook(1), csvkit(1)

> TERMINAL_GEAR

Curated for the Linux community

Copied to clipboard