LinuxCommandLibrary

trawl

Recursively search for data in archives

TLDR

Show column names

$ trawl -n
copy

Filter interface names using a case-insensitive regular expression
$ trawl -f wi
copy

List available interfaces
$ trawl -i
copy

Include the loopback interface
$ trawl -l
copy

SYNOPSIS

trawl [options] directory

PARAMETERS

-d, --depth=DEPTH
    Maximum depth to descend into directories. Defaults to unlimited.

-e, --eval=CODE
    Evaluate Perl for each file. Useful for extracting data. Requires a valid perl interpreter. The filename is stored in the variable $_.

-f, --find=PATTERN
    Find files matching PATTERN (Perl regular expression).

-h, --help
    Show help message.

-i, --ignore=PATTERN
    Ignore files matching PATTERN (Perl regular expression).

-l, --follow
    Follow symbolic links.

-m, --md5
    Output MD5 checksum for each file.

-o, --output=FILE
    Output to FILE. Defaults to standard output.

-q, --quiet
    Quiet mode. Suppress warnings.

-r, --recurse
    Recurse into directories (default).

-s, --size
    Output size of each file.

-t, --type=TYPE
    Only process files of type TYPE. TYPE can be 'f' for file, 'd' for directory, 'l' for symbolic link, 'b' for block special, 'c' for character special, 'p' for named pipe (FIFO), or 's' for socket.

-v, --verbose
    Verbose mode. Print extra information.

-V, --version
    Show version information.

DESCRIPTION

The trawl command is a powerful command-line utility for recursively traversing directory trees and extracting data based on user-defined rules.
It's typically used to efficiently process large filesystems, locate specific file types, extract data from files using regular expressions or custom scripts, and perform actions on the matched files or extracted data.

Unlike simple `find` commands, trawl provides a more structured way to define extraction rules, making it easier to manage complex data processing tasks. It supports various extraction methods, including regular expressions, string matching, and user-defined functions written in scripting languages like Python or Perl. Extracted data can be outputted in various formats, such as CSV, JSON, or plain text. This makes it well-suited for data mining, log analysis, and system administration tasks.

CAVEATS

Requires Perl interpreter for -e option. Regular expression syntax is Perl-compatible. Can be resource intensive on very large filesystems.

EXIT STATUS

Returns 0 on successful completion. Non-zero on errors, for example, if a directory cannot be read.

EXAMPLES

Find all files ending with '.txt':
trawl -f '\.txt$' .
Extract email addresses from text files:
trawl -f '\.txt$' -e 'while (/<([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})>/g) { print $1, "\n"; }' .

SEE ALSO

find(1), grep(1), awk(1), sed(1)

Copied to clipboard