LinuxCommandLibrary

llamafile

TLDR

Run llamafile

$ ./[model].llamafile
copy
Run with web UI
$ ./[model].llamafile --server
copy
Generate from prompt
$ ./[model].llamafile -p "[prompt]"
copy
Interactive chat mode
$ ./[model].llamafile -i
copy
Set context size
$ ./[model].llamafile -c [8192] -p "[prompt]"
copy
Specify threads
$ ./[model].llamafile -t [8] -p "[prompt]"
copy

SYNOPSIS

llamafile [options]

DESCRIPTION

llamafile is a single-file executable that bundles llama.cpp with a model for portable LLM inference. The same file runs on Linux, macOS, Windows, and BSD without installation.
llamafiles are self-contained and include a web UI for chat interactions.

PARAMETERS

-m model

Model file (if not embedded).
-p prompt
Input prompt.
-i
Interactive mode.
--server
Start web server.
-c size
Context window size.
-t threads
Number of threads.
-ngl n
GPU layers.
--port port
Server port.

CREATING LLAMAFILE

$ # Bundle model into llamafile
zipalign -j0 llamafile model.gguf
chmod +x llamafile
copy

CAVEATS

File sizes can be large (several GB). Some systems need chmod +x. Apple Silicon requires signing. Memory mapped for efficiency.

HISTORY

llamafile was created by Justine Tunney at Mozilla in 2023, combining Cosmopolitan Libc's universal binary format with llama.cpp.

SEE ALSO

Copied to clipboard