ugrep
Search files using flexible pattern matching
TLDR
Start a query TUI to search files in the current directory recursively (
Search the current directory recursively for files containing a regex search pattern
Search in a specific file or in all files in a specific directory, showing line numbers of matches
Search in all files in the current directory recursively and print the name of each matching file
Fuzzy search files with up to 3 extra, missing or mismatching characters in the pattern
Also search compressed files, Zip and tar archives recursively
Search only files whose filenames match a specific glob pattern
Search only C++ source files (use --file-type=list to list all file types)
SYNOPSIS
ugrep [OPTIONS] PATTERN [FILE...]
PARAMETERS
PATTERN
The regular expression or fixed string to search for.
FILE...
One or more files to search. If no files are specified, ugrep reads from standard input.
-i, --ignore-case
Ignores case distinctions when matching the PATTERN.
-v, --invert-match
Selects lines that do not match the PATTERN.
-r, --recursive
Recursively searches through directories and their subdirectories.
-l, --files-with-matches
Prints only the names of files containing matches, one per line.
-n, --line-number
Prefixes each match with the 1-based line number within its input file.
-E, --extended-regexp
Interprets PATTERN as an extended regular expression (ERE).
-F, --fixed-strings
Interprets PATTERN as a list of fixed strings, separated by newlines, instead of regular expressions.
-P, --perl-regexp
Interprets PATTERN as a Perl Compatible Regular Expression (PCRE).
--encoding=ENCODING
Specifies the input character encoding (e.g., UTF-8, UTF-16, ISO-8859-1). ugrep attempts to auto-detect if not specified.
--best-match
Enables fuzzy matching, showing the match that is closest to the PATTERN, even if it's not an exact match.
--approximate=NUM
Performs approximate (fuzzy) matching, allowing up to NUM errors (insertions, deletions, substitutions).
DESCRIPTION
ugrep is a powerful and versatile command-line utility designed for searching plain-text data for lines that match a regular expression. Unlike traditional grep, ugrep offers robust support for various Unicode encodings, including UTF-8, UTF-16, and others, automatically detecting the encoding of input files. This makes it an invaluable tool for working with internationalized text and multi-byte character sets, where standard grep might produce incorrect results or fail to find matches due to character encoding issues.
It combines the familiar functionality and speed of grep with advanced Unicode capabilities, ensuring accurate pattern matching across diverse linguistic content. ugrep is also optimized for performance, often leveraging efficient algorithms and multi-threading to quickly process large files and directories, making it suitable for both casual use and complex data analysis tasks.
CAVEATS
While ugrep excels in Unicode handling and performance, it might not be pre-installed on all Linux distributions, requiring manual installation. Its advanced features like fuzzy matching are specific to certain implementations (e.g., Radu Gruian's ugrep) and may not be present in all ugrep variants.
ENCODING AUTO-DETECTION
ugrep attempts to automatically detect the character encoding of input files. This feature significantly simplifies working with diverse text files without requiring manual specification of encoding for each file.
PERFORMANCE OPTIMIZATIONS
Beyond Unicode handling, ugrep is often optimized for speed, employing techniques like memory mapping (mmap) and multi-threading to enhance search performance on large datasets. This makes it one of the faster tools for text searching.
APPROXIMATE/FUZZY MATCHING
A standout feature of some ugrep implementations is its ability to perform approximate or fuzzy matching. This allows users to find patterns that are 'close enough' to the specified PATTERN, tolerating a certain number of errors (insertions, deletions, substitutions), which is highly useful for searching noisy data or when the exact spelling is uncertain.
HISTORY
ugrep emerged as a solution to the limitations of traditional grep when dealing with the increasing prevalence of multi-byte character encodings, particularly UTF-8. As globalized content became standard, the need for a search tool that could correctly interpret and match patterns across various Unicode scripts became critical. The most widely recognized implementation, developed by Radu Gruian, significantly expanded on grep's capabilities by integrating robust Unicode support, advanced regular expression engines, and performance optimizations. It addresses the challenge of accurately searching internationalized text files, which often proved problematic for older tools not designed with comprehensive Unicode awareness.