llvm-mc

Assemble and disassemble LLVM machine code

TLDR

Assemble assembly code file into object file with machine code

$ llvm-mc --filetype=obj -o [path/to/output.o] [path/to/input.s]

Disassemble object file with machine code into assembly code file

$ llvm-mc --disassemble -o [path/to/output.s] [path/to/input.o]

Compile LLVM bit code file into assembly code

$ llvm-mc -o [path/to/output.s] [path/to/input.bc]

Assemble assembly code from stdin and show encoding to stdout

$ echo "[addl %eax, %ebx]" | llvm-mc -show-encoding -show-inst

Disassemble machine code from stdin for specified triple

$ echo "[0xCD 0x21]" | llvm-mc --disassemble -triple=[target_name]

-assemble
    Assemble the input file. This is the default operation if -disassemble is not specified.

-disassemble
    Disassemble the input file, interpreting its contents as machine code.

-arch
    Specify the target architecture for assembly or disassembly, e.g., x86, aarch64, arm.

-mcpu
    Specify the target CPU model for instruction set and performance tuning, e.g., skylake for x86, cortex-a72 for AArch64.

-filetype
    Specify the output file type. Common types include obj (object file), asm (assembly file), or null (discard output).

-output
    Specify the output filename. If not provided, output is sent to standard output.

-show-encoding
    When disassembling, display the raw byte encoding for each instruction.

-show-inst
    When disassembling, display the internal LLVM instruction representation alongside the assembly.

-triple
    Specify the target triple, e.g., x86_64-linux-gnu. This implicitly sets the architecture, vendor, OS, and environment.

-help
    Display a summary of command-line options.

-version
    Display the version of llvm-mc.

DESCRIPTION

llvm-mc is a versatile utility within the LLVM project that functions as both a machine code assembler and disassembler. It provides a command-line interface for converting assembly language source code into machine-executable object files, or conversely, for disassembling machine code back into human-readable assembly instructions.

This tool is invaluable for developers involved in low-level system programming, compiler development, or security research. It allows precise inspection and manipulation of machine code across various architectures supported by LLVM, leveraging LLVM's robust target-independent code generation and machine code emission infrastructure. It's often used for testing new LLVM target backends, analyzing binaries, or understanding the low-level representation of compiled code.

CAVEATS

llvm-mc's functionality is heavily dependent on the specified target architecture and CPU, as it relies on the LLVM backend's support for that specific target. When disassembling raw binary input, ensure the correct architecture is specified. Note that llvm-mc only produces object files and does not perform linking to create executables.

TYPICAL USE CASES

llvm-mc is frequently used by LLVM developers for testing new target backends or instruction sets, by security researchers for reverse engineering binaries and analyzing malware, and by low-level programmers for deeply understanding the translation of high-level code into machine instructions. Its ability to read from standard input and write to standard output makes it highly adaptable for integration into automated scripts and complex toolchains.

HISTORY

llvm-mc emerged as a fundamental utility within the broader LLVM project, developed to provide a consistent and flexible interface for machine code assembly and disassembly across its expanding array of supported architectures. Its evolution is closely tied to LLVM's growth as a modern compiler infrastructure, aiming to offer a more robust and extensible alternative to traditional, often platform-specific, assemblers and disassemblers by leveraging LLVM's powerful intermediate representation and target description frameworks.