llamafile

single-file executable that bundles llama

TLDR

Run llamafile

$ ./[model].llamafile

Run with web UI

$ ./[model].llamafile --server

Generate from prompt

$ ./[model].llamafile -p "[prompt]"

Interactive chat mode

$ ./[model].llamafile -i

Set context size

$ ./[model].llamafile -c [8192] -p "[prompt]"

Specify threads

$ ./[model].llamafile -t [8] -p "[prompt]"

llamafile is a single-file executable that bundles llama.cpp with a model for portable LLM inference. The same file runs on Linux, macOS, Windows, and BSD without installation.
llamafiles are self-contained and include a web UI for chat interactions.

PARAMETERS

-m model

Model file (if not embedded).

-p prompt

Input prompt.

-i

Interactive mode.

--server

Start web server.

-c size

Context window size.

-t threads

Number of threads.

-ngl n

GPU layers.

--port port

Server port.

CREATING LLAMAFILE

$ # Bundle model into llamafile
zipalign -j0 llamafile model.gguf
chmod +x llamafile

CAVEATS

File sizes can be large (several GB). Some systems need chmod +x. Apple Silicon requires signing. Memory mapped for efficiency.

HISTORY

llamafile was created by Justine Tunney at Mozilla in 2023, combining Cosmopolitan Libc's universal binary format with llama.cpp.

llamafile

single-file executable that bundles llama

TLDR

SYNOPSIS

DESCRIPTION

PARAMETERS

CREATING LLAMAFILE

CAVEATS

HISTORY

SEE ALSO

> TERMINAL_GEAR

> TERMINAL_GEAR