LinuxCommandLibrary

llamafile

single-file executable that bundles llama

TLDR

Run llamafile

$ ./[model].llamafile
copy
Run with web UI
$ ./[model].llamafile --server
copy
Generate from prompt
$ ./[model].llamafile -p "[prompt]"
copy
Interactive chat mode
$ ./[model].llamafile -i
copy
Set context size
$ ./[model].llamafile -c [8192] -p "[prompt]"
copy
Specify threads
$ ./[model].llamafile -t [8] -p "[prompt]"
copy

SYNOPSIS

llamafile [options]

DESCRIPTION

llamafile is a single-file executable that bundles llama.cpp with a model for portable LLM inference. The same file runs on Linux, macOS, Windows, and BSD without installation.
llamafiles are self-contained and include a web UI for chat interactions.

PARAMETERS

-m model

Model file (if not embedded).
-p prompt
Input prompt.
-i
Interactive mode.
--server
Start web server.
-c size
Context window size.
-t threads
Number of threads.
-ngl n
GPU layers.
--port port
Server port.

CREATING LLAMAFILE

$ # Bundle model into llamafile
zipalign -j0 llamafile model.gguf
chmod +x llamafile
copy

CAVEATS

File sizes can be large (several GB). Some systems need chmod +x. Apple Silicon requires signing. Memory mapped for efficiency.

HISTORY

llamafile was created by Justine Tunney at Mozilla in 2023, combining Cosmopolitan Libc's universal binary format with llama.cpp.

SEE ALSO

> TERMINAL_GEAR

Curated for the Linux community

Copied to clipboard

> TERMINAL_GEAR

Curated for the Linux community