csvclean
CSV file validator and cleaner
TLDR
SYNOPSIS
csvclean [options] file
DESCRIPTION
csvclean is part of csvkit that validates and cleans CSV files. It detects common problems like inconsistent column counts, stray quotes, and encoding issues, either reporting them or fixing them automatically.
When run without -n, it creates two output files: one with cleaned data and one with rows that had errors. This allows review of problematic rows without losing data. With -n, it only reports errors without creating files.
The tool handles various CSV dialects and can work with files using different delimiters, quote characters, and encodings. It's essential for preprocessing messy data before analysis.
PARAMETERS
FILE
CSV file to clean or validate.-n, --dry-run
Don't create output files, just report errors.-d CHAR, --delimiter CHAR
Field delimiter (default: comma).-t, --tabs
Use tabs as delimiter.-q CHAR, --quotechar CHAR
Quote character (default: double quote).-e ENCODING, --encoding ENCODING
Input file encoding.--no-header-row
File has no header row.-H, --no-inference
Disable type inference.
CAVEATS
Automatic cleaning may alter data in unexpected ways; review cleaned output carefully. Large files can be slow to process. Some edge cases in CSV formatting may not be detected. Original file is not modified.
HISTORY
csvclean is part of csvkit, created by Christopher Groskopf and first released in 2011. csvkit provides a suite of tools for working with CSV files, designed to bring the power of Unix philosophy to tabular data processing.

