LinuxCommandLibrary

rga

Search recursively with ripgrep and apply patches

TLDR

Search recursively for a pattern in all files in the current directory

$ rga [regex]
copy

List available adapters
$ rga --rga-list-adapters
copy

Change which adapters to use (e.g. ffmpeg, pandoc, poppler etc.)
$ rga --rga-adapters=[adapter1,adapter2] [regex]
copy

Search for a pattern using the mime type instead of the file extension (slower)
$ rga --rga-accurate [regex]
copy

Display help
$ rga --help
copy

SYNOPSIS

rga [options] <pattern> [path ...]

Note: rga supports most ripgrep options in addition to its own.

PARAMETERS

pattern
    The regular expression or literal string to search for.

[path ...]
    One or more paths to files or directories to search. If omitted, rga searches the current directory.

--rga-config
    Specifies an alternative configuration file path instead of the default (~/.config/rga/config or ~/.rgarc).

--rga-no-cache
    Disables the use of the transformer cache. By default, rga caches transformed content for faster subsequent searches.

--rga-list-transformers
    Lists all available file types and their associated transformer commands that rga can utilize.

--rga-debug-transformers
    Enables debug output for transformer execution, showing what commands are run and their output. Useful for troubleshooting.

--rga-transform-for
    Forces rga to use a specific transformer for a given file type, even if the file extension doesn't match.

--rga-verbose
    Increases verbosity, showing more information about the processing steps.

(any ripgrep option)
    rga passes most standard ripgrep options directly to the underlying rg command. Examples include --ignore-case (-i), --word-regexp (-w), --line-number (-n), --fixed-strings (-F), --glob (-g), and many others for controlling search behavior, output format, and file filtering.

DESCRIPTION


The rga command, short for ripgrep-all, is an advanced command-line utility designed to extend the powerful search capabilities of ripgrep to virtually all file types. While ripgrep excels at searching plain text and source code, rga allows users to efficiently search through a vast array of document formats, including PDFs, Office documents (DOCX, XLSX, ODT, EPUB), compressed archives (ZIP, RAR, 7z), and even images (via OCR for text extraction).

rga achieves this by acting as a wrapper around ripgrep. When executed, it intelligently identifies the file type and, if necessary, uses a collection of external tools (known as 'transformers') such as pdftotext, unzip, catdoc, pandoc, or tesseract-ocr, to extract the textual content. This extracted text is then piped to ripgrep, which performs the actual pattern matching at high speed. This innovative approach combines ripgrep's performance with a universal search scope, making rga an invaluable tool for finding information across diverse data collections.

CAVEATS

rga's functionality heavily depends on the presence and proper configuration of external helper programs (transformers). If a required tool is missing, rga may not be able to process certain file types. OCR (Optical Character Recognition) for image files can be computationally intensive and slower compared to text-based searches. Searching very large or malformed archive/document files might also consume significant resources or fail. Customizing the configuration file (typically ~/.config/rga/config) is often necessary to fine-tune its behavior or integrate new transformers.

CONFIGURATION AND TRANSFORMERS

The core of rga's extensibility lies in its configurable 'transformers'. These are external programs defined in a configuration file (defaulting to ~/.config/rga/config or ~/.rgarc) that convert specific file types into plain text. For instance, a PDF transformer might use pdftotext, while a DOCX transformer could use pandoc. Users can customize this file to add support for new formats or modify how existing ones are handled, ensuring rga can adapt to almost any file type imaginable.

PERFORMANCE CONSIDERATIONS

While rga benefits from ripgrep's speed, the overall performance depends on the efficiency of the underlying transformers. Text extraction from complex documents (like large PDFs or encrypted archives) or OCR processes can introduce significant overhead. rga implements caching to mitigate this by storing extracted text for frequently searched files, which can dramatically speed up subsequent searches on the same content. However, the first search will always incur the full transformation cost.

HISTORY

rga was created by phi-nlp (Philippe Lauri) to address the limitation of ripgrep being restricted to plain text files. The project was conceived to leverage ripgrep's speed and robustness while expanding its reach to the vast ecosystem of binary document formats. It has gained popularity as a go-to tool for developers and users needing a comprehensive file search solution across their entire filesystem, bridging the gap between fast code search and universal document indexing.

SEE ALSO

rg(1) - ripgrep, grep(1) - print lines matching a pattern, find(1) - search for files in a directory hierarchy, catdoc(1) - reads Microsoft Word files and prints their content as text, pdftotext(1) - Portable Document Format (PDF) to text converter (part of poppler-utils), unzip(1) - list, test and extract compressed files in a ZIP archive, tesseract(1) - an OCR engine

Copied to clipboard