trawl
Recursively search for data in archives
TLDR
Show column names
Filter interface names using a case-insensitive regular expression
List available interfaces
Include the loopback interface
SYNOPSIS
trawl [options] directory
PARAMETERS
-d, --depth=DEPTH
Maximum depth to descend into directories. Defaults to unlimited.
-e, --eval=CODE
Evaluate Perl for each file. Useful for extracting data. Requires a valid perl interpreter. The filename is stored in the variable $_.
-f, --find=PATTERN
Find files matching PATTERN (Perl regular expression).
-h, --help
Show help message.
-i, --ignore=PATTERN
Ignore files matching PATTERN (Perl regular expression).
-l, --follow
Follow symbolic links.
-m, --md5
Output MD5 checksum for each file.
-o, --output=FILE
Output to FILE. Defaults to standard output.
-q, --quiet
Quiet mode. Suppress warnings.
-r, --recurse
Recurse into directories (default).
-s, --size
Output size of each file.
-t, --type=TYPE
Only process files of type TYPE. TYPE can be 'f' for file, 'd' for directory, 'l' for symbolic link, 'b' for block special, 'c' for character special, 'p' for named pipe (FIFO), or 's' for socket.
-v, --verbose
Verbose mode. Print extra information.
-V, --version
Show version information.
DESCRIPTION
The trawl command is a powerful command-line utility for recursively traversing directory trees and extracting data based on user-defined rules.
It's typically used to efficiently process large filesystems, locate specific file types, extract data from files using regular expressions or custom scripts, and perform actions on the matched files or extracted data.
Unlike simple `find` commands, trawl provides a more structured way to define extraction rules, making it easier to manage complex data processing tasks. It supports various extraction methods, including regular expressions, string matching, and user-defined functions written in scripting languages like Python or Perl. Extracted data can be outputted in various formats, such as CSV, JSON, or plain text. This makes it well-suited for data mining, log analysis, and system administration tasks.
CAVEATS
Requires Perl interpreter for -e option. Regular expression syntax is Perl-compatible. Can be resource intensive on very large filesystems.
EXIT STATUS
Returns 0 on successful completion. Non-zero on errors, for example, if a directory cannot be read.
EXAMPLES
Find all files ending with '.txt':
trawl -f '\.txt$' .
Extract email addresses from text files:
trawl -f '\.txt$' -e 'while (/<([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})>/g) { print $1, "\n"; }' .