LinuxCommandLibrary

Regular Expressions

Basic Matching

A regular expression (regex) is a pattern that describes a set of strings. Most Linux tools like grep, sed, and awk support regex for searching and transforming text. A literal string is the simplest pattern — it matches itself.
$ echo "hello world" | grep "hello"
copy
$ grep -i "error" /var/log/syslog
copy

Anchors

Anchors match a position rather than a character.
PatternDescription
^Start of line
$End of line
\bWord boundary
\BNon-word boundary
$ grep "^#" config.txt
copy
$ grep "\.conf$" filelist.txt
copy

Character Classes

A character class matches one character from a defined set.
PatternDescription
.Any single character (except newline)
[abc]One of a, b, or c
[^abc]Any character except a, b, or c
[a-z]Any lowercase letter
[A-Z]Any uppercase letter
[0-9]Any digit
[a-zA-Z0-9]Any alphanumeric character
\dAny digit (same as [0-9])
\DAny non-digit
\wAny word character (letter, digit, underscore)
\WAny non-word character
\sAny whitespace (space, tab, newline)
\SAny non-whitespace character
\d, \w, and \s are Perl-style shortcuts. They work in `grep -P` and most programming languages but not in basic POSIX regex.

POSIX Classes

POSIX character classes are portable across all Unix tools. They must be used inside brackets: `[[:digit:]]`.
ClassDescription
[:alpha:]Alphabetic characters
[:digit:]Digits (0-9)
[:alnum:]Alphanumeric characters
[:space:]Whitespace characters
[:upper:]Uppercase letters
[:lower:]Lowercase letters
[:punct:]Punctuation characters
[:print:]Printable characters
[:blank:]Space and tab
$ grep "[[:digit:]]" data.txt
copy

Quantifiers

Quantifiers control how many times the preceding element must appear.
PatternDescription
*****Zero or more times
+One or more times
?Zero or one time
{n}Exactly n times
{n,}n or more times
{n,m}Between n and m times
$ grep -E "o{2,}" words.txt
copy
In basic regex (BRE), quantifiers `+`, `?`, `{`, and `}` must be escaped with a backslash. Use `grep -E` for extended regex where they work without escaping.

Groups and Alternation

Parentheses create groups for applying quantifiers or capturing matches. The pipe symbol provides alternation.
PatternDescription
(abc)Group — match "abc" as a unit
a|bAlternation — match a or b
\1Backreference — match the first captured group again
\2Backreference — match the second captured group
$ echo "abcabc" | grep -E "(abc)\1"
copy
$ grep -E "cat|dog" animals.txt
copy
Backreferences are useful in sed for rearranging matched text.
$ echo "John Smith" | sed -E "s/(.*) (.*)/\2, \1/"
copy

Escaping

The backslash `\` removes the special meaning of a metacharacter. The special characters that need escaping depend on the regex flavor.
In extended regex (ERE), these characters are special: `. * + ? ( ) [ ] { } | ^ $ \`
To match a literal dot, period, or other special character, prefix it with a backslash.
$ grep -E "192\.168\.1\.1" hosts.txt
copy

Basic vs Extended Regex

Linux tools support two main regex flavors.
Basic Regular Expressions (BRE) are the default for [grep](/man/grep) and [sed](/man/sed). In BRE, the characters `+`, `?`, `{`, `}`, `(`, `)`, and `|` are treated as literals — you must escape them with `\` to use their special meaning.
Extended Regular Expressions (ERE) treat those characters as special by default. Use the -E flag to enable ERE.
$ grep -E "error|warning" logfile
copy
$ sed -E "s/[0-9]+/NUM/g" data.txt
copy
Perl-compatible regex (PCRE) adds features like lookahead, lookbehind, and non-greedy quantifiers. Use `grep -P` where available.
$ grep -P "\d{3}-\d{4}" contacts.txt
copy

Common Examples

Match lines that look like an email address.
$ grep -E "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}" contacts.txt
copy
Match an IPv4 address.
$ grep -E "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" logfile
copy
Remove blank lines from a file.
$ sed "/^$/d" file.txt
copy
Remove lines starting with a comment character.
$ sed "/^#/d" config.txt
copy
Extract the third column from whitespace-separated data.
$ awk "{print \$3}" data.txt
copy
Replace multiple spaces with a single space.
$ sed -E "s/ +/ /g" messy.txt
copy

> TERMINAL_GEAR

Curated for the Linux community

Copied to clipboard