csvgrep
Search CSV files based on column content
TLDR
Find rows that have a certain string in column 1
Find rows in which columns 3 or 4 match a certain regular expression
Find rows in which the "name" column does NOT include the string "John Doe"
SYNOPSIS
csvgrep [OPTIONS] column pattern file.csv
PARAMETERS
column
Index or name of the column to search. Column indices start at 1. Column names are case-sensitive and must match the header row in the CSV file.
pattern
Regular expression to match. Rows containing a matching value in the specified column will be included in the output.
file.csv
The CSV file to search. If omitted, csvgrep reads from standard input.
-i, --ignore-case
Ignore case distinctions in both the pattern and the input data.
-r, --re-flags=FLAGS
Passes flags directly to the regex engine. see python documentation.
-n, --names
Use the first row as column names. Must be included if using column names rather than column indexes
-H, --header
Show header row in output. Defaults to true
-v, --invert-match
Select non-matching lines.
-d, --delimiter=DELIMITER
Specify the field delimiter. Defaults to a comma (,).
-e, --encoding=ENCODING
Specify the file encoding.
-l, --linenumbers
Show line numbers (1-based).
-p, --preamble=LINES
Rows to skip before the header row.
-z, --zero-based
Column indices start at 0.
DESCRIPTION
The csvgrep command is a powerful tool for searching and filtering data within Comma Separated Value (CSV) files. It allows users to extract specific rows from a CSV file based on matching criteria defined by regular expressions or string comparisons applied to specific columns.
Unlike basic grep which operates line by line, csvgrep understands the structure of CSV data and avoids matching within quoted fields or across multiple rows that constitute a single record. This makes it much more reliable and effective for working with structured data than simple text searching.
The command supports various options for specifying the column to search, the matching pattern, case-sensitivity, and output formatting. This makes it highly versatile for a range of data analysis and manipulation tasks.
CAVEATS
csvgrep relies on the CSV file being well-formed. Inconsistent quoting or delimiter characters can lead to incorrect results. Column names are case sensitive.
EXAMPLES
To find all rows where the 'name' column contains 'john':
csvgrep -n name john data.csv
To find all rows where the third column matches the pattern '^[A-Z]{3}$':
csvgrep 3 '^[A-Z]{3}$' data.csv