LinuxCommandLibrary

llvm-strings

Extract strings from LLVM bitcode files

TLDR

View documentation for the original command

$ tldr strings
copy

SYNOPSIS

llvm-strings [options] <input files...>

PARAMETERS

-a, --all
    Scan the entire file, not just initialized sections.

-f, --print-file-name
    Print the name of the file before each string found.

-min-len=, -n
    Print sequences of characters that are at least characters long. The default is usually 4.

-o, --offset
    Print the offset within the file before each string in decimal radix.

-t , --radix=
    Print the offset in the specified radix. Valid formats are 'o' (octal), 'x' (hexadecimal), and 'd' (decimal).

-e , --encoding=
    Specify the character encoding of the strings to find. Options include 's' (single-byte characters, usually ASCII/UTF-8), 'S' (double-byte big-endian), 'b' (double-byte little-endian), 'l' (4-byte little-endian), and 'B' (4-byte big-endian).

--section=
    Only search for strings within the specified section (e.g., '.text', '.data').

-v, --version
    Display the version information of llvm-strings.

-h, --help
    Display a summary of command-line options.

DESCRIPTION

llvm-strings is a powerful utility from the LLVM project, designed to extract sequences of printable characters from specified files. It functions similarly to the standard Unix strings(1) command but is uniquely optimized for inspecting LLVM bitcode files, as well as common executable formats like ELF, Mach-O, and COFF. This tool is invaluable for reverse engineering, debugging, and security analysis, allowing users to uncover embedded text such as error messages, URLs, configuration strings, or other textual data hidden within various sections of a binary, including data, read-only data, and code. It provides flexible options to control string length, character encoding, and output formatting, making it adaptable for diverse analytical needs.

CAVEATS

llvm-strings primarily detects printable characters based on the specified encoding; it may not correctly identify obfuscated, compressed, or encrypted strings. The output can be very verbose for large binary files, potentially requiring filtering or redirection.
It searches for contiguous sequences of printable characters, which means non-printable bytes within a string will terminate it, potentially splitting legitimate strings.

SUPPORTED FILE TYPES

Beyond generic binary files, llvm-strings is proficient at parsing and extracting strings from various specific file types, including LLVM bitcode files, ELF (Executable and Linkable Format), Mach-O (macOS/iOS), and COFF (Common Object File Format) executables, shared libraries, and object files.

CHARACTER ENCODING SUPPORT

The command offers robust support for various character encodings, allowing users to accurately extract strings in ASCII, UTF-8, and wide character sets such as UTF-16 and UTF-32 (both big-endian and little-endian), which is crucial for internationalized software analysis.

HISTORY

llvm-strings is an integral part of the LLVM project, which commenced development in 2000 at the University of Illinois at Urbana-Champaign. It was developed to provide a specialized utility for extracting strings that seamlessly integrates with the LLVM toolchain, understanding and processing LLVM-specific file formats (like bitcode) in addition to standard executable formats. Its continuous development reflects the evolving needs of analyzing modern compiled binaries and LLVM artifacts.

SEE ALSO

strings(1), llvm-objdump(1), llvm-readobj(1), llvm-nm(1)

Copied to clipboard