ugrep

Search files using flexible pattern matching

TLDR

Start a query TUI to search files in the current directory recursively ( for help)

$ ugrep [[-Q|--query]]

Search the current directory recursively for files containing a regex search pattern

$ ugrep "[search_pattern]"

Search in a specific file or in all files in a specific directory, showing line numbers of matches

$ ugrep [[-n|--line-number]] "[search_pattern]" [path/to/file_or_directory]

Search in all files in the current directory recursively and print the name of each matching file

$ ugrep [[-l|--files-with-matches]] "[search_pattern]"

Fuzzy search files with up to 3 extra, missing, or mismatching characters in the pattern

$ ugrep [[-Z|--fuzzy=]][3] "[search_pattern]"

Also search compressed files, Zip and .tar archives recursively

$ ugrep [[-z|--decompress]] "[search_pattern]"

Search only files whose filenames match a specific glob pattern

$ ugrep [[-g |--glob=]]"[glob_pattern]" "[search_pattern]"

Search only C++ source files (use --file-type=list to list all file types)

$ ugrep [[-t |--file-type=]]cpp "[search_pattern]"

PATTERN
    The regular expression or fixed string to search for.

FILE...
    One or more files to search. If no files are specified, ugrep reads from standard input.

-i, --ignore-case
    Ignores case distinctions when matching the PATTERN.

-v, --invert-match
    Selects lines that do not match the PATTERN.

-r, --recursive
    Recursively searches through directories and their subdirectories.

-l, --files-with-matches
    Prints only the names of files containing matches, one per line.

-n, --line-number
    Prefixes each match with the 1-based line number within its input file.

-E, --extended-regexp
    Interprets PATTERN as an extended regular expression (ERE).

-F, --fixed-strings
    Interprets PATTERN as a list of fixed strings, separated by newlines, instead of regular expressions.

-P, --perl-regexp
    Interprets PATTERN as a Perl Compatible Regular Expression (PCRE).

--encoding=ENCODING
    Specifies the input character encoding (e.g., UTF-8, UTF-16, ISO-8859-1). ugrep attempts to auto-detect if not specified.

--best-match
    Enables fuzzy matching, showing the match that is closest to the PATTERN, even if it's not an exact match.

--approximate=NUM
    Performs approximate (fuzzy) matching, allowing up to NUM errors (insertions, deletions, substitutions).

DESCRIPTION

ugrep is a powerful and versatile command-line utility designed for searching plain-text data for lines that match a regular expression. Unlike traditional grep, ugrep offers robust support for various Unicode encodings, including UTF-8, UTF-16, and others, automatically detecting the encoding of input files. This makes it an invaluable tool for working with internationalized text and multi-byte character sets, where standard grep might produce incorrect results or fail to find matches due to character encoding issues.

It combines the familiar functionality and speed of grep with advanced Unicode capabilities, ensuring accurate pattern matching across diverse linguistic content. ugrep is also optimized for performance, often leveraging efficient algorithms and multi-threading to quickly process large files and directories, making it suitable for both casual use and complex data analysis tasks.

CAVEATS

While ugrep excels in Unicode handling and performance, it might not be pre-installed on all Linux distributions, requiring manual installation. Its advanced features like fuzzy matching are specific to certain implementations (e.g., Radu Gruian's ugrep) and may not be present in all ugrep variants.

ENCODING AUTO-DETECTION

ugrep attempts to automatically detect the character encoding of input files. This feature significantly simplifies working with diverse text files without requiring manual specification of encoding for each file.

PERFORMANCE OPTIMIZATIONS

Beyond Unicode handling, ugrep is often optimized for speed, employing techniques like memory mapping (mmap) and multi-threading to enhance search performance on large datasets. This makes it one of the faster tools for text searching.

APPROXIMATE/FUZZY MATCHING

A standout feature of some ugrep implementations is its ability to perform approximate or fuzzy matching. This allows users to find patterns that are 'close enough' to the specified PATTERN, tolerating a certain number of errors (insertions, deletions, substitutions), which is highly useful for searching noisy data or when the exact spelling is uncertain.

HISTORY

ugrep emerged as a solution to the limitations of traditional grep when dealing with the increasing prevalence of multi-byte character encodings, particularly UTF-8. As globalized content became standard, the need for a search tool that could correctly interpret and match patterns across various Unicode scripts became critical. The most widely recognized implementation, developed by Radu Gruian, significantly expanded on grep's capabilities by integrating robust Unicode support, advanced regular expression engines, and performance optimizations. It addresses the challenge of accurately searching internationalized text files, which often proved problematic for older tools not designed with comprehensive Unicode awareness.