LinuxCommandLibrary

csvgrep

Search CSV files based on column content

TLDR

Find rows that have a certain string in column 1

$ csvgrep [[-c|--columns]] [1] [[-m|--match]] [string_to_match] [data.csv]
copy

Find rows in which columns 3 or 4 match a certain regular expression
$ csvgrep [[-c|--columns]] [3,4] [[-r|--regex]] [regular_expression] [data.csv]
copy

Find rows in which the "name" column does NOT include the string "John Doe"
$ csvgrep [[-i|--invert-match]] [[-c|--columns]] [name] [[-m|--match]] "[John Doe]" [data.csv]
copy

SYNOPSIS

csvgrep [OPTIONS] column pattern file.csv

PARAMETERS

column
    Index or name of the column to search. Column indices start at 1. Column names are case-sensitive and must match the header row in the CSV file.

pattern
    Regular expression to match. Rows containing a matching value in the specified column will be included in the output.

file.csv
    The CSV file to search. If omitted, csvgrep reads from standard input.

-i, --ignore-case
    Ignore case distinctions in both the pattern and the input data.

-r, --re-flags=FLAGS
    Passes flags directly to the regex engine. see python documentation.

-n, --names
    Use the first row as column names. Must be included if using column names rather than column indexes

-H, --header
    Show header row in output. Defaults to true

-v, --invert-match
    Select non-matching lines.

-d, --delimiter=DELIMITER
    Specify the field delimiter. Defaults to a comma (,).

-e, --encoding=ENCODING
    Specify the file encoding.

-l, --linenumbers
    Show line numbers (1-based).

-p, --preamble=LINES
    Rows to skip before the header row.

-z, --zero-based
    Column indices start at 0.

DESCRIPTION

The csvgrep command is a powerful tool for searching and filtering data within Comma Separated Value (CSV) files. It allows users to extract specific rows from a CSV file based on matching criteria defined by regular expressions or string comparisons applied to specific columns.
Unlike basic grep which operates line by line, csvgrep understands the structure of CSV data and avoids matching within quoted fields or across multiple rows that constitute a single record. This makes it much more reliable and effective for working with structured data than simple text searching.
The command supports various options for specifying the column to search, the matching pattern, case-sensitivity, and output formatting. This makes it highly versatile for a range of data analysis and manipulation tasks.

CAVEATS

csvgrep relies on the CSV file being well-formed. Inconsistent quoting or delimiter characters can lead to incorrect results. Column names are case sensitive.

EXAMPLES

To find all rows where the 'name' column contains 'john':
csvgrep -n name john data.csv

To find all rows where the third column matches the pattern '^[A-Z]{3}$':
csvgrep 3 '^[A-Z]{3}$' data.csv

SEE ALSO

grep(1), csvlook(1), csvcut(1)

Copied to clipboard