LinuxCommandLibrary

tsv-filter

Filter rows in tab-separated value (TSV) data

TLDR

Print the lines where a specific column is numerically equal to a given number

$ tsv-filter -H --eq [field_name]:[number] [path/to/tsv_file]
copy

Print the lines where a specific column is [eq]ual/[n]on [e]qual/[l]ess [t]han/[l]ess than or [e]qual/[g]reater [t]han/[g]reater than or [e]qual to a given number
$ tsv-filter --[eq|ne|lt|le|gt|ge] [column_number]:[number] [path/to/tsv_file]
copy

Print the lines where a specific column is [eq]ual/[n]ot [e]qual/part of/not part of a given string
$ tsv-filter --str-[eq|ne|in-fld|not-in-fld] [column_number]:[string] [path/to/tsv_file]
copy

Filter for non-empty fields
$ tsv-filter --not-empty [column_number] [path/to/tsv_file]
copy

Print the lines where a specific column is empty
$ tsv-filter --invert --not-empty [column_number] [path/to/tsv_file]
copy

Print the lines that satisfy two conditions
$ tsv-filter --eq [column_number1]:[number] --str-eq [column_number2]:[string] [path/to/tsv_file]
copy

Print the lines that match at least one condition
$ tsv-filter --or --eq [column_number1]:[number] --str-eq [column_number2]:[string] [path/to/tsv_file]
copy

Count matching lines, interpreting first line as a [H]eader
$ tsv-filter --count -H --eq [field_name]:[number] [path/to/tsv_file]
copy

SYNOPSIS

tsv-filter [options]

PARAMETERS

-h
    Display help message.

-c
    Specify the column number to filter on (1-based).

-v
    Specify the value to compare against.

-eq
    Filter for rows where the column value is equal to the specified value.

-ne
    Filter for rows where the column value is not equal to the specified value.

-gt
    Filter for rows where the column value is greater than the specified value.

-lt
    Filter for rows where the column value is less than the specified value.

-ge
    Filter for rows where the column value is greater or equal to the specified value.

-le
    Filter for rows where the column value is less or equal to the specified value.

-re
    Filter for rows where the column value matches the specified regular expression.

-i
    Make the regular expression matching case-insensitive (used with -re).

DESCRIPTION

The `tsv-filter` command is a simple yet powerful tool for filtering Tab-Separated Values (TSV) data. It allows you to extract specific rows from a TSV file based on conditions applied to one or more columns. It operates by comparing column values against specified criteria (equal, not equal, greater than, less than, regex match, etc.) and only outputs rows that satisfy these conditions. This is useful for quickly extracting subsets of data from large TSV files for further analysis or processing. The command provides a flexible and scriptable way to perform data filtering tasks from the command line. Its versatility allows for diverse filtering scenarios, enhancing data manipulation workflows.

Key benefits include the ability to specify filtering criteria based on column number, regular expressions, and different comparison operators, making it an indispensable tool for data manipulation tasks.

CAVEATS

The tool assumes a simple TSV format, without escaped tabs or other complex features. Numeric comparisons (-gt, -lt, -ge, -le) might not work as expected with columns containing non-numeric data.

EXAMPLES

  • To filter rows where the value in the 2nd column is equal to 'example':
    tsv-filter -c 2 -v example -eq file.tsv
  • To filter rows where the value in the 3rd column is greater than '100':
    tsv-filter -c 3 -v 100 -gt file.tsv
  • To filter rows where the value in the 1st column matches the regular expression 'pattern':
    tsv-filter -c 1 -v pattern -re file.tsv

SEE ALSO

awk(1), cut(1), grep(1), sed(1)

Copied to clipboard