LinuxCommandLibrary

tr

Translate or delete characters from a stream

TLDR

Replace all occurrences of a character in a file, and print the result

$ tr [find_character] [replace_character] < [path/to/file]
copy

Replace all occurrences of a character from another command's output
$ echo [text] | tr [find_character] [replace_character]
copy

Map each character of the first set to the corresponding character of the second set
$ tr '[abcd]' '[jkmn]' < [path/to/file]
copy

Delete all occurrences of the specified set of characters from the input
$ tr [[-d|--delete]] '[input_characters]' < [path/to/file]
copy

Compress a series of identical characters to a single character
$ tr [[-s|--squeeze-repeats]] '[input_characters]' < [path/to/file]
copy

Translate the contents of a file to upper-case
$ tr "[:lower:]" "[:upper:]" < [path/to/file]
copy

Strip out non-printable characters from a file
$ tr [[-cd|--complement --delete]] "[:print:]" < [path/to/file]
copy

SYNOPSIS

tr [OPTION]... SET1 [SET2]
tr -d [OPTION]... SET1
tr -s [OPTION]... SET1
tr -ds [OPTION]... SET1 [SET2]

PARAMETERS

SET1
    The set of characters to be translated, deleted, or squeezed. Can be literal characters, ranges (e.g., 'a-z'), octal values, or POSIX character classes.

SET2
    The set of characters to which characters from SET1 are translated. Required for translation operations. If shorter than SET1, the last character of SET2 is repeated (unless -t is used).

-c, --complement
    Complement SET1. This means characters not in SET1 are processed.

-d, --delete
    Delete characters in SET1 from the input. SET2 is ignored when this option is used alone.

-s, --squeeze-repeats
    Replace each sequence of a repeated character that is in SET1 with a single occurrence of that character. If SET2 is also specified, characters are first translated, then squeezed.

-t, --truncate-set1
    Truncate SET1 to the length of SET2 before translation. Useful when SET1 is longer than SET2 to prevent the last character of SET2 from being repeated.

DESCRIPTION

The tr command, short for "translate or delete characters," is a powerful command-line utility in Unix-like operating systems used for character manipulation. It reads from standard input, performs operations on individual characters, and writes the result to standard output.

Its primary functions are to translate occurrences of characters from one set to another or to delete specific characters from the input stream. It's often used in conjunction with pipes to filter or transform text data. Common applications include converting case (e.g., uppercase to lowercase), removing unwanted characters, or squeezing multiple occurrences of a character into a single one. Unlike sed or awk, tr operates purely on characters, not lines or fields, making it highly efficient for character-level transformations.

CAVEATS

  • tr processes input character by character, not line by line. It does not understand line breaks or patterns in the same way sed or awk do.
  • If SET2 is shorter than SET1 (and -t is not used), the last character of SET2 is reused for the remaining characters in SET1. If SET2 is longer, the extra characters in SET2 are ignored.
  • Locale settings can influence how character ranges (e.g., 'a-z') are interpreted. For consistent behavior, it's often recommended to set LC_ALL=C.

CHARACTER SETS AND RANGES

tr interprets SET1 and SET2 as character sets, which can be defined in several ways:

  • Literal characters: E.g., 'abc'.
  • Ranges: E.g., 'a-z' (all lowercase letters from 'a' to 'z'), '0-9' (all digits).
  • Octal values: E.g., '\101' for character 'A'.
  • Special escape sequences: '\n' (newline), '\t' (tab), '\\' (backslash), etc.
  • POSIX character classes: These must be enclosed in [::], e.g., '[:lower:]'. Common classes include:
    [:alnum:] (alphanumeric characters)
    [:alpha:] (alphabetic characters)
    [:blank:] (space and tab characters)
    [:cntrl:] (control characters)
    [:digit:] (decimal digits)
    [:graph:] (printable characters excluding space)
    [:lower:] (lowercase letters)
    [:print:] (printable characters including space)
    [:punct:] (punctuation characters)
    [:space:] (all whitespace characters)
    [:upper:] (uppercase letters)
    [:xdigit:] (hexadecimal digits)

COMMON USE CASES

  • Case Conversion: Convert input from lowercase to uppercase or vice versa.
    Example: echo "Hello World" | tr '[:lower:]' '[:upper:]' -> HELLO WORLD
  • Removing Characters: Delete specific unwanted characters from the input stream.
    Example: echo "a-b-c" | tr -d '-' -> abc
  • Squeezing Repeats: Replace sequences of repeated characters with a single instance.
    Example: echo "Heeellooo" | tr -s 'eo' -> Helo
  • Replacing Newlines: Convert multi-line input into a single line or vice versa.
    Example: cat file.txt | tr '\n' ' ' (replaces newlines with spaces)

HISTORY

The tr command has been a standard Unix utility since the earliest versions of Unix. It was part of the original AT&T Unix system, making it one of the oldest and most fundamental command-line tools. Its design reflects the Unix philosophy of small, specialized tools that do one thing well and can be combined using pipes. The core functionality has remained largely consistent over decades, with minor enhancements and standardization efforts (like POSIX).

SEE ALSO

sed(1), awk(1), cut(1), grep(1)

Copied to clipboard